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(54) Audio communication control unit 

(57) In a channel branching part input audio signals 
from at least three terminals connected to a switching 
part are branched to left- and right-channels audio sig- 
nals, then in a sound image control part the left- and 
right-channel audio signals are processed using sound 
image control parameters in such a manner as to impart 
them target positions different for each terminal, then in 
an mixing part all left -channel audio signals correspond- 
ing to the respective terminals are mixed together into 
a left-channel mixed audio signal and all he right-chan- 
nel audio signals are mixed together into a right-channel 
mixed audio signal, and in a terminal-associated 
branching part these left- and right-channel mixed audio 
signal are distributed and sent to all the connected ter- 
minals. 
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Description 

BACKGROUND OF THE INVENTION 

The present invention relates to an audio commu- 5 
nication control unit which controls the processing of au- 
dio signals in a multi-point teleconference which in- 
volves audio communications, such as an audio, video 
or multi-media conference that is held via communica- 
tion network. w 

An audio conference system, a multi-point video 
conference system or the like employs an audio com- 
munication control unit that mixes together audio signals 
received from conference participants after multiplying 
each audio signal by a weighting coefficient correspond- is 
ing to the number of simultaneously speaking persons, 
for instance, and transmits the mixed audio signal to 
each conference participant. 

Conventional audio communication control units 
are those which have means for mixing together audio 20 
signals received from all conference participants or 
means for demanding the allowance to speak. 

In the case of using only one channel (down-link 
channel) to transmit the audio signal to each location 
where a communication terminal locates (which audio zs 
signal will hereinafter referred to as a down-link audio 
signal), there arise such problems as listed below 

With a scheme that mixes audio signals of plural 
speakers together, when two or more parlies speak si- 
multaneously, the audio signals are mixed and a mixed so 
sound is reproduced using one sound driver (or loud- 
speaker). This deteriorates the intelligibility for the lis- 
tener and makes it difficult for him to identify the speak- 
ers. Furthermore, it would be necessary for each partic- 
ipant to carry out some operations for transmitting a de- 35 
mand to speak when the participant wanted to utter in 
the teleconference and also the communication control 
unit has to manage all the demands, thus preventing the 
participants from free conversations. 

On the other hand, it is known in the art that a spatial 40 
sense of each speaker's voice originating from a unique 
position assists to identify the speaker and improves 
speech intelligibility (D.R. Begault, "Multichannel Spa- 
tial Auditory Display for Speech Communications," Jour- 
nal of the Audio Engineering Society, 42, pp. 819-826, 
1 994). What is intended to mean by the sound localiza- 
tion mentioned herein is to make the listener judge the 
position of the sound he is hearing. Usually, the imaged 
position of a sound coincides with the real position of 
sound source However, there has been devised a tech- so 
mque to enable a listener to localize the sound image at 
a target position. 

Now, a brief description will be given of a typical 
scheme for localization of multiple sounds at respective 
target positions. As shown in Fig. 2, acoustic transfer ss 
functions such as head-related transfer functions H 1L 
and H 1R from a sound source 1 to the left and right ears 
of the listener in Fig. 1 are each convolved in an audio 



signal S v At the same time, acoustic transfer functions 
H 2L and H 2R from a sound source 2 to the left and right 
ears are similarly convolved in an audio signal S 2 differ- 
ent from that S,. The audio signals resulting from the 
convolution are mixed together and the mixed audio sig- 
nal is presented to the both ears over a stereo headset. 
By this, sound stimuli *H 1L +S 2 *H 2L and 

Si *H 1 r+S 2 *H 2 r, which are equivalent to those when the 
audio signals reach the both ears of the listener from the 
sound sources 1 and 2, are given to the left and right 
ears, respectively, as shown in Fig. 2. In such an in- 
stance, the listener can localize sound images for the 
audio signals St and S 2 at the same spatial positions as 
those of the sound sources 1 and 2 in Fig. 1. Other 
schemes are also described in detail, for example, in J. 
Blauert, Gotoh and Morimoto, "Spatial Hearing. The 
Psychophysics of Human Sound Localization," (Cam- 
bridge, MA: MIT Press, 1 983) and so forth. 

A prior art example that applies the above-de- 
scribed findings to the multi-point audio communication 
is teleconference terminal described in Japanese Pat. 
Laid-Open Gazette No. 10744/92, for instance. As de- 
picted in Fig. 3, the communication terminal equipment 
proposed in the past has means 3L and 3R for process- 
ing audio signals from other terminal. For the signal 
processing parameters related to respective target po- 
sitions are employed. For the control of these terminal, 
it is indispensable to preassign an identification number 
(or a terminal address) ID to each terminal (or confer- 
ence participant) and whenever the participant is to 
transmit his audio signal, he has to transmit his number 
ID together with the audio signal. The communication 
terminal shown in fig. 3 receives an identification 
number ID which specifies the origination of the re- 
ceived audio signal. That is, a received signal from an- 
other communication terminal is separated by a signal 
separation part 1 into an ID and audio signal. In re- 
sponse to the separated ID code, a switch control part 
2 selects speech signal processing means 3R and 3L, 
which perform convolution of a pair of acoustic transfer 
functions corresponding to one of spatial positions allot- 
ted to the terminal having that ID The speech signal 
from the signal separation part 1 is fed to the selected 
pair of speech signal processing means '3R and 3L and 
convolved with the pair of transfer functions to repro- 
duce a sound image which is localized at the allotted 
spatial position. Accordingly, the introduction of this prior 
art communication terminal at each location necessi- 
tates procedures for transmission of identification 
number, thus restricting the feasibility of the prior art 
communication system. These shortcomings hinder an 
economical implementation of multi-point audio tele- 
communications of the type in which voices of parties 
can be localized at respective positions. 

For two-point teleconferencing service : there has 
also been proposed a system that detects acoustic en- 
vironments in a conference room at a local station by, 
for example a stereo microphone and information of the 
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environments is coded and transmitted to the other re- 
mote station (U.S. Patent No. 5,020,098, for instance). 
Application of this system to a three- or more-point tel- 
econference, however, demands to connect communi- 
cation channels among the respective locations. In ad- 
dition each terminal must be provided a decoding de- 
vice. 

Fig. 4 shows another prior art system, in which ter- 
minals 4 are each equipped with sound localization sig- 
nal processing means 4A and interconnected one an- 
other over a network communication channels. In this 
case, the number C M of communication channels need- 
ed is at least M(M-1 )/2. where M is the number of termi- 
nals 4 to be interconnected. This connection scheme is 
impractical because with the increase in the number M, 
the required number of channels for interconnected of 
all possible combinations increases rapidly by the factor 
of about M 2 . 

As a modification of the method for the realization 
of the multi-point audio telecommunication of the type 
that the listener localizes the voice of each party at a 
different spatial position through utilization of sound lo- 
calization techniques similar to the above-mentioned, 
there has been proposed a method that conducts com- 
munication among the terminals of two or more desired 
groups of respective points as described in , for example, 
Cohen. : Koizumi N. and Aoki S. : "Design and Control of 
Shared Conferencing Environments or Audio Telecom- 
munication," Proc. Int. Symp. on Measurement and 
Control in robotics, pp. 405-412, Nov. 1992. 

This method also requires that each terminal be 
provided with audio signal processing means for 
processing the audio signal transmitted over the com- 
munication line from each location to localize the voice 
of each speaker at a different position and mixing means 
for mixing audio signals generated by the audio signal 
processing means. Further, it is necessary to specify the 
originating location and transmit the audio signal for 
each location. To meet this requirement, the communi- 
cation system to be used needs to be predetermined. 
This leaves unsettled the problem of requiring predeter- 
mination of the communication system used. 

It is therefore an object of the present invention to 
provide an audio communication control unit which per- 
mits the implementation of a multi-point teleconference 
that yields high intelligibility for multiple audio sounds in 
case of multiple simultaneous utterance without the 
need of equipping each terminal with high audio signal 
processing capability. 

Another object of the present invention is to provide 
audio communication control unit which allows any ter- 
minals accommodated in a communication network, to 
access and make use thereof. 

A further object of the present invention is to provide 
an audio communication control unit which simultane- 
ously implements one or more communications among 
a desired combination of connected terminals where 
each participant localizes sounds originated from re- 



spective terminals speakers at respective positions. 

SUMMARY OF THE INVENTION 

s According to the present invention, an audio com- 
munication control unit for teleconferencing, which is 
connected via communication network to a plurality of 
terminals, comprises: 

10 a switching part for switching audio signals received 
from N terminals via said communication network, 
N being an integer equal to or greater than three; 
N input channels connected to said switching part 
and supplied with the input audio signals from said 
N terminals, respectively; 

a channel branching part for branching each of said 
input audio signals from said N input channels to K 
branched audio signals of K branched channels, K 
being an integer equal to or greater than 2; 

20 sound image control part for processing said K 
branched audio signals of said K branched chan- 
nels corresponding to each of said N input channels 
with corresponding one of N parameter sets each 
including K sound image control parameters of pre- 

25 determined kind or kinds to produce sound-image 
controlled audio signals of K branch channels cor- 
responding to each of said N input channels, at least 
one of said N parameter sets being different from 
the other parameter sets according to target posi- 

30 tion of said terminals; 

mixing part for mixing said sound-image controlled 
audio signals of K branch channels corresponding 
respectively to said N terminals, for each branch 
channel, to thereby generate mixed audio signals 

35 of K channels; and 

a terminal-associated branching part for branching 
said mixed audio signals of K channels in corre- 
spondence with said N terminals for input into said 
switching part. 

40 

BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 is a diagram for explaining acoustic transfer 
functions intended for sound localization; 
45 Fig. 2 is a diagram for explaining an example of au- 
dio signal processing intended for sound localiza- 
tion; 

Fig. 3 is a block diagram showing an example of the 
configuration of a terminal for conventional multi- 

so point audio telecommunication; 

Fig. 4 is a block diagram showing an example of a 
network arrangement in a conventional multi-point 
audio telecommunication system; 
Fig. 5 is a block diagram showing, by way of exam- 

55 pie, the accommodation of communication chan- 
nels for use with an audio communication control 
unit 100 according to the present invention; 
Fig 6 is a block diagram illustrating the basic con- 
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struction of the audio communication control unit 
according to the present invention; 
Fig. 7 is a block diagram showing an example of the 
terminal configuration for use in the system of Fig. 5; 
Fig. 3 is a block diagram illustrating an audio com- s 
munication control unit according to a first embodi- 
ment of the present invention; 
Fig. 9. is a waveform diagram for explaining a 
speaker identifying method; 

Fig. 10 is a block diagram showing an example of 10 
the construction of an audio signal processing part 
25 in Fig. 8; 

Fig. 11 is a block diagram showing another example 
of the construction of the audio signal processing 
part 25 in Fig. 8; is 
Fig. 1 2 is a timing chart for explaining a first principal 
speaker identifying method and an example of the 
operation in Fig. 11; 

Fig. 1 3 is a timing chart for explaining a second prin- 
cipal speaker identifying method and another ex- 20 
ample of the operation in Fig. 11; 
Fig 14 is a block diagram illustrating a second em- 
bodiment of the audio communication control unit 
according to the present invention; 
Fig. 15 is a block diagram showing an example of 2s 
the construction of a terminal-associated mixing 
control part corresponding to each terminal in Fig. 
14: 

Fig. 16 is a block diagram showing a third embodi- 
ment of the audio communication control unit ac- 30 
cording to the present invention; 
Fig. 17 is a block diagram showing an example of 
the construction of a sound image processing part 
8-1 in Fig. 16; 

Fig. 13 is a diagram for explaining target positions 35 
for sound localization; 

Fig 19 is a diagram for explaining combinations of 
terminals belonging to one or more teleconferenc- 
es: 

Fig. 20 is a block diagram illustrating a fourth em- **o 
bodiment of the audio communication control unit 
according to the present invention; 
Fig 21 A is a diagram showing an example of target 
positions for sound localization in one teleconfer- 
ence by the embodiment of Fig. 20; *s 
Fig. 21 B is a diagram showing an example of target 
positions for sound localization in another telecon- 
ference by the embodiment of Fig. 20; 
Fig. 21C is a diagram showing target positions for 
sound localization in two teleconferences by the so 
embodiment of Fig. 20; 

Fig. 22 is a block diagram of a fifth embodiment of 
the present invention illustrating an example of the 
construction of the Fig. 20 embodiment; 
Fig. 23 is a block diagram showing an example of ss 
the construction of each mixing/branching part 1 7-P 
in Fig 22: 

Fig 24 is a block diagram of a sixth embodiment of 



the present invention illustrating a modified form of 
the Fig. 20 embodiment; 

Fig. 25 is a block diagram of a seventh embodiment 
illustrating an example of the construction of the Fig. 
24 embodiment; 

Fig. 26A is a diagram showing an example of target 
positions for sound localization possible in one tel- 
econference by the embodiment of Fig. 24 or 25; 
and 

Fig. 26B is a diagram showing an example of target 
positions for sound localization possible in another 
teleconference by the embodiment of Fig. 24 or 25. 

DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

Fig. 5 schematically illustrates the general configu- 
ration of a mufti-point teleconference system using the 
audio communication control unit according to the 
present invention. The audio communication control unit 
of the present invention, identified generally by 100, has 
a switching part 11 , which is connected to a communi- 
cation network 40 such as ISDN or LAN and is accessi- 
ble by each terminal connected thereto. Owing to limi- 
tations on the capacity and the throughput of the audio 
communication control unit 100, the maximum number 
N of conference participants (or the number of termi- 
nals) who are allowed to simultaneously participate in a 
conference is prescribed, N being an integer equal to or 
greater than 3. For example, four conference participat- 
ing terminals TM-1 to TM-4 are connected via a com- 
munication network 40 and the switching part 11 to four 
of N input channels to C 4 . The input channels C } to 
C N are connected therethrough to an audio signal mix- 
ing control part 10, constituting a multi-point teleconfer- 
ence system which enables the conference participants 
to talk to one another. As described later in detail the 
audio signal mixing control part 10 processes the audio 
signal originated from each terminal by using one kind 
of sound image control parameters relating to a sound 
image such as levels (or level, attenuation, amplifica- 
tion, etc.), delays, phases and transfer functions or a 
desired combination thereof so that at least one set of 
the sound image control parameters operates on audio 
signals from one terminal and other sets of the sound 
image control parameters on audio signals from other 
terminals. 

Fig. 6 illustrates in- block form the basic configura- 
tion of the audio communication control unit 100 of the 
present invention for use in the system of Fig. 5. The 
switching part 11 selectively connects the communica- 
tion channels from terminals requesting to participate in 
a conference to the audio signal mixing control part 10 
via the N input channels to C N . The audio signal mix- 
ing control part 10 comprises, a channel branching part 
13 by which audio signals fed to the N input channels 
Ct to C N from a maximum of N connected terminals are 
each branched into audio signals on predetermined K 
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branch channels (where K is an integer equal to or great- 
er than 2, and in Fig. 6, K is set to 2, each corresponding 

to one of left and right channels) B JL and B JR (J=1 

N): a sound image control part 14 which controls N sets 
of K-channel branched audio signals by predetermined s 
sound image control parameters; an mixing part 15 
which mixes N channel-associated corresponding ones 
of N sets of sound-image controlled K-channel audio 
signals to generate K-channel mixed audio signals; and 
a terminal-associated branching part 1 6 which branches 10 
the K-channel mixed audio signals into N sets of K-chan- 
nel signals, respectively, for input into the switching part 
11. The channel branching part 13, the sound image 
control part 1 4 and the mixing part 1 5 constitute an audio 
signal processing part 25. The switching part 11 medi- is 
ates therethrough the N sets of K-channel signals to the 
N conference participating terminals TM-1 to TM-N. In 
Fig. 6 two-channel (K=2) audio signals are each sent 
from the switching part 1 1 to one of participating terminal 
over two down-link channels. 20 

As described above, Fig. 6 shows the case of K=2, 
and the input audio signals are each branched into two- 
channel branched audio signals at one of branch points 

3- 1. 3-2, 3-N in the channel branching part 13. In Fig. 

6 the two channels are shown to correspond to left and 25 
right channels as in the prior art. The parts related to the 
left channel are each identified by a suffix L to their ref- 
erence numerals and the parts related to the right chan- 
nel are identified by a suffix R. The sound image control 
part 14 comprises N sets of signal processing parts 30 

4- 1 L, 4-1 R, 4-2L, 4-2R, .... 4-NL, 4-NR and processes 
the branched audio signals by using predetermined 
kinds of sound image control parameters, respectively. 

As described earlier, it is possible to use, as the 
sound image control parameters, various kinds of pa- 35 
rameters, such as level, phase, delay and acoustic 
transfer functions. Brief descriptions will be given below 
about the effects on sound images by each of those pa- 
rameters, assuming that the number of branches with 
which each of the input audio signal is branched at each 40 
branching point is two. 

(a) In the case of using levels (either one of volume, 
attenuation factor and amplification factor) as the 
sound image control parameters, the direction of a ^5 
sound image reproduced by left and right loud- 
speakers in association with the input audio signal 
can be set to a desired direction between the two 
loudspeakers by controlling the relative levels of the 
left- and right-branched audio signals correspond- so 
ing to the input audio signal. 

(b) In the case of using phases (in-phase or inverted 
phase) as the sound image control parameters, the 
sound image reproduced by left and right loud- 
speakers can be provided with or deprived of per- 55 
spective by controlling the phase of the left and right 
branched audio signals corresponding to the input 
audio signal to be in-phase or inverted phase to 



each other 

(c) In the case of using delay as the sound image 
control parameters, the direction of the sound im- 
age reproduced by left and right loudspeakers (or 
stereo-headset ) can be set to a desired direction 
around a listener by controlling the relative delay of 
the left and right branched audio signals corre- 
sponding to the input audio signal. 

(d) In the case of using acoustic transfer functions 
as the sound image control parameters, the sound 
image reproduced by a stereo-headset can be lo- 
calized at a desired spatial position by convolving a 
pair of acoustic transfer functions corresponding to 
the target position with the left and right branched 
audio signals corresponding to the input audio sig- 
nal. 

The sound image control parameters are provided 
from a parameter setting part 1 4C to the signal process- 
ing parts 4-1 L, 4-1 R, 4-2L. 4-2R 4-NL, 4-NR. The 

sound image control parameters may be determined in 
accordance with, for example, the number of confer- 
ence participants. In the case of Fig. 6. the audio signals 
from the signal processing parts 4-1 L, 4-2L, .... 4-NL are 
mixed by an mixer 5L in the mixing part 15 into a left- 
channel mixed audio signal, whereas the audio signals 
from the signal processing parts 4-1 R : 4-2R, 4-NR 
are mixed by an mixer 5R in the mixing part 15 into a 
right-channel mixed audio signal. Hence, the K-channel 
signals, which are distributed from the terminal-associ- 
ated branching part 16 to the respective terminals TM- 
1 to TM-N, contain components derived from audio sig- 
nal arrived from all the participating terminals. 

As depicted in Fig. 7, the terminals TM-1 to TM-N 
are each composed of a microphone MC, a transmitting 
part 51, a decoding part 52 and reproducing parts 53L 
and 53R. The received K-channel (K=2) encoded audio 
signal is decoded by the decoding part 52 into audio sig- 
nals of the respective channels, which are transduced 
by the reproducing parts 53L and 53R into sounds. 
Hence the sounds, to which the user of each terminal 
TM listens, may contain voices sent from all participating 
terminals 

According to the present invention, by selecting dif- 
ferent sound image control parameters for the N sets of 
branched audio signals in the sound image control part 
14, a participant at each terminal TM can discerns the 
sound of the voice, originated from at least one of the 
terminals, from the sound of the voices originated from 
the other remaining terminals. The properties of the 
sound image to be controlled are such as the spatial po- 
sition and spaciousness that the listener psycho-acous- 
tically or auditorily perceives. For example, when the re- 
producing parts 53L and 53R of the terminal are loud- 
speakers: the sound image could be controlled by using, 
as the sound image control parameters for the left- and 
right-channel audio signals, any one of interchannel lev- 
el difference, interchannel delay difference and relative 
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phase (in phase, opposite phase) or a combination of 
the level difference and the time difference. By using 
such predetermined sound image control parameters 
which are operated on the N sets of left and right audio 
signals in the signal processing parts 4-1 L 4-1 R. 4-2L s 

4-2R 4-NL 4-NR of the sound image control part 14 

in Fig. 6, a desired sound image could be reproduced 
at each terminal. When a headset is used as the repro- 
ducing parts 53L and 53R in Fig. 7. the number of chan- 
nels is limited to K=2. By convolving the N sets of left io 
and right speech signals with transfer functions corre- 
sponding to desired target positions of the sound sourc- 
es, as sound image control parameters, in the sound 
image control part 14 in Fig. 6 : a mixed sound is repro- 
duced by the reproducing parts 53L and 53R in Fig.7 so is 
that each component originated from each terminal can 
be localized at desired target positions. 

A description will be given of embodiments of the 
present invention in connection with the case where 
K=2, and when the reproducing parts of each terminal 20 
are loudspeakers, K23 is also possible. 

With reference to the drawings, concrete operative 
examples of the invention will hereinbelow be de- 
scribed. 

25 

FIRST EMBODIMENT 

Fig. 6 illustrates a first embodiment of the audio 
communication control unit based on the basic configu- 
ration of Fig. 6 according to the present invention, 30 
wherein a plurality of terminals TM-1 to TM-M are con- 
nected via communication lines 40 to the audio commu- 
nication control unit 100 of the present invention. In this 
embodiment, a principal speaker as an origin of audio 
signals judged by monitoring the audio signals on the 35 
input channels C } to C N from the switching part 11 con- 
nected to a plurality of participating terminals. In the au- 
dio communication control unit 100, the audio signals 
from all the participating terminals are processed so that 
listeners can distinguish the judged sound position orig- *o 
mated from the principal speaker from the judged sound 
position originated from the speaker at any other termi- 
nal 

The audio communication control unit 100 of this 
embodiment comprises the switching part 11 , audio sig- 4 5 
nal and control signal or video signal multiplexing/de- 
multiplexing part 22, an audio signal decoding part 23A, 
an utterance detection processing part 23B, a speaker 
selecting part 24 for selecting speakers whose sounds 
are to be mixed together, an audio signal processing 50 
part 25, an echo cancelling part 26, a audio signal cod- 
ing parts 27 and 26 ; a down-link audio signal selecting 
part 29, a signal processing control part 20 and an image 
display control part 30. The switching par 11. the multi- 
plexing/demultiplexing part 22, the audio signal decod- 55 
ing part 23A and the utterance detection processing part 
23B each perform processing corresponding to the ter- 
minal as the origin of the audio signal and have a 
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throughput or capability for processing audio signals 
originated from the maximum number N of simultane- 
ously accessed terminals. 

In Fig. 8 there is illustrated, as an example of the 
terminal TM, a video conference terminal that transmits 
and receives video information and speech information 
at the same time Since the presence of video informa- 
tion is irrelevant to the present invention and since the 
video display control part 30 is not directly related to the 
subject matter of the invention, no detailed description 
will be given of video display control. However video 
information could be utilized to operate the conference 
in which the terminals TM-1 to TM-N are each to partic- 
ipate and to control the combination of conference par- 
ticipants. In such a case, a signal related to audio signal 
mixing control is applied from the video display control 
part 30 to the signal processing control part 20. 

ThB operation of the audio communication control 
unit 100 will hereinafter be described in connection with 
the case where M terminals (TM-1 to TM-M) are con- 
nected via the communication lines 40 to the unit 100. 

The communication lines 40 used are those capa- 
ble of interactive audio communication, such as N-ISDN 
lines, leased lines, analog telephone circuits, LAN cir- 
cuits, individual circuits, or multiplexed logical circuits. 
Additionally, it does not matter whether the communica- 
tion channels are wired or radio channels if the switching 
part 1 1 is adapted to the type of communication network 
40. This embodiment will be described to use N-ISDN 
circuits (assigning transmission bands of 64 kb/s for vid- 
eo and 64 kb/s for audio). 

For example, video conference terminals designed 
for N-ISDN lines can be used as the terminals TM-1 to 
TM-M. In this instance, the terminals TM-1 to TM-M 
need to have the function of receiving two-channel audio 
signals. 

The terminals TM-1 to TM-M are connected via the 
communication network 40 to the switching part 11 of 
the audio communication control unit 1 00. Video and au- 
dio signals and control signal lor controlling the combi- 
nation of participating terminals, multiplexed into one 
channel by standard regulations such as ITU-T Recom- 
mendation H. 221 and sent from the terminals TM-1 to 
TM-M, are demultiplexed by the multiplexing/demulti- 
plexing part 22. The video signal and the video display 
control signal thus demultiplexed are sent to the video 
display control part 30. Since the video display control 
is not directly relevant to the present invention, no de- 
scription will be given thereof. 

The audio or speech control information is sent from 
the multiplexing/demultiplexing part 22 to the audio sig- 
nal processing control part 20. As the speech control 
information, it is possible to employ information such as 
a request for participation/leave the conference. The au- 
dio signals demultiplexed in the multiplexing/demulti- 
plexing part 22 are each decoded in the audio signal 
decoding part 23A into : for example, a PCM-encoded 
audio signal for subsequent processing. For the sake of 
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brevity, the signal will be referred to simply as an audio 
signal in the processing described below. 

The utterance detection processing part 23B de- 
tects speech by : for instance, monitoring the power of 
the audio signal. When detecting speech, the speech 
detecting control pari 23B supplies the signal process- 
ing control part 20 with a control signal representing the 
utterance. In Fig. 9 there is shown an example of the 
utterance detecting scheme in the utlerance detection 
processing part 23B. On the basis of the input audio sig- 
nal (Fig 9A), an integrated power IT over a unit time (for 
100 ms, for instance) (Fig. 9B) is estimated. Then the 
integrated power value IT is compared with an ON-de- 
tection threshold E 0N and an OFF-detection threshold 
E OF p to judge the utterance at the terminal. 

With a first utterance identification scheme, when 
the unit-time integrated power IT exceeds the ON-de- 
tection threshold E 0N , utterance at the terminal con- 
cerned is immediately judged, and when the integrated 
power IT decreases lower than the OFF-detection 
threshold E 0FFl it is immediately decided that the termi- 
nal is in the non-speaking or silent state. Therefore, ut- 
terance is judged during the diagonally shaded periods 
(a-b, c-d, f-g) in Fig. 9C. According to the first identifica- 
tion scheme, the utterance-silence judgement is fre- 
quently switched. 

A second utterance identification scheme differs 
from the first scheme in that the former judged the ut- 
terance on the assumption that the utterance is assured 
to be continued for a certain period (T in Fig. 9D) after 
the unit-time integrated power IT falls below the OFF- 
detection threshold E OFF . According to this scheme, ut- 
terance is judged during the diagonally shaded periods 
(a-e, f-h) in Fig. 9D. 

The audio signal originated from the terminal where 
utterance is detected in the utterance detection process- 
ing part 23B is selected in the speaker selecting part 24 
in Fig. 8. The selected audio signal is provided to the 
audio signal processing part 25 via any one of selected 
audio signal channels A, to A N . The audio signal 
processing part 25 includes the channel branching part 
1 3. the sound image control part U and the mixing part 
15 that are principal components of the present inven- 
tion. 

The signal processing control part 20 operates on 
control signals received from the multiplexing /demulti- 
plexing part 22 and the utterance detection processing 
part 23B or a conference control signal from the video 
display control part 30. Taking into account the number 
of persons currently speaking, the number of persons 
requesting to speak and other condition, for example, a 
chairperson who must preferentially be given the allow- 
ance to speak at all times, the signal processing control 
part 20 determines those of the selected audio signal 
channels corresponding to the terminals, whose audio 
signal is to be mixed, and their priorities. The speaker 
selecting part 24 connects the selected audio signal 
channels A, to to the input channels positions fol- 



lowing the determined priorities. In this embodiment, it 
is assumed that the audio signal originated from the 
principal speaker is mediated to the selected audio sig- 
nal channel A, and second to N-th speakers to the se- 

s lected audio signal channels A2 to A N . 

The signal processing control part 20 controls the 
operation of the audio signal processing part 25 to mix 
and distribute audio signals originated from multiple 
speakers without deterioration of intelligibility of the 

10 sound from the principal speaker. In this example the 
audio signal processing part 25 assigns the audio signal 
to the left-channel audio signal and the audio signals 
originated from the other speakers to the right-channel 
signal both to the audio signal coding part 28. 

is The audio signal coding part 28 encodes and mul- 
tiplexes the two-channel stereo mixed audio signals 
from the audio signal processing part 25 by a stereo en- 
coder The down-link audio signal selecting part 29 cor- 
responds to the terminal-associated branching part 16 

20 in Fig. 6 and, for the communication network corre- 
sponding to the terminals other than that of the principal 
speaker, selects the encoded stereo mixed audio signal 
from the audio signal coding part 28 . As for the line cor- 
responding to the terminal of the principal speaker, the 

25 stereo audio signal encoded by the audio signal coding 
part 27 is selected. In this case, however the signal orig- 
inated from the principal speaker is cancelled by the 
echo cancelling part 26 from the left- and right-channel 
mixed audio signals for echo suppression prior to cod- 

30 ing. The selected audio signals are each applied to the 
multiplexing/demultiplexing part 22. 

The multiplexing/demultiplexing part 22 multiplexes 
the stereo audio signals from the down-link speech se- 
lecting part 29 and video information from the video dis- 
ss play control part 30 and sends the multiplexed signals 
to the terminals TM-1 to TM-M from the switching part 
11 via the communication lines 40. 

The audio signal mixing processing and sound im- 
age control processing based on human hearing or au- 

40 ditory characteristics and the prevailing custom in con- 
ferences are performed in the audio signal processing 
25. As mentioned previously herein, the audio signal 
processing part 25 has the signal channel branching 
part 13, the sound image control part 14 and the audio 

45 signal mixing part 1 5 in Fig. 6. This embodiment controls 
the branched audio signal of each selected audio signal 
channel by using, as the sound image control parame- 
ters, attenuation tor controlling the interchannel level dif- 
ference and phase for controlling the left- and right- 

so channel audio signals to be in-phase or opposite place. 
Fig. 10 illustrates a concrete example of the audio 
signal processing part 25. in which the branched audio 
signals are controlled using the interchannel phase re- 
lation as the sound image control parameter. A level 

55 control part 1 4A and a phase control part 1 4B constitute 
the sound image control part 14. The audio signals of 
the selected audio signal channels A, to A^g are attenu- 
ated by attenuators 4-1 : 4-2, . .. 4-N of the level control 
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part 14Ato 1/2 1 ' 2 -, l/N 1 ' 2 -, .... 1/N 1/2 -f old levels, respec- 
tively. The audio signals outputted from the attenuators 
4-1 to 4-N are branched at branch points 3-1. 3-2, .... 
3-N in the channel branching part 13 to left- and right- 
channel signals on left and right branched channels B 1 L( 
B 1R . .... B NL . B NRt respectively, which are fed to the 
phase control pari 14B, wherein they are controlled by 

phase controllers 4-1 L 4-1 R, 4-2L. 4-2R 4-NL, 4-NR 

to be in-phase or 180 degrees out-of-phase with each 
other. The sound image control parameters such as at- 
tenuation and phase are set in the parameter setting 
part 1 4C under the control of the signal processing con- 
trol part 20. 

The audio signal originated from the principal 
speaker, that is, the signal of the selected audio signal 
channel A n is attenuated by using the attenuator 4-1 to 
1/2 1/2 and branched at the branch point 3-1 to left- and 
right-channel signals, which are controlled by the phase 
controllers 4-1 L and 4-1 R to be in-phase with each other 
and provided to the mixers 5L and 5R, respectively. The 
left- and right-channel audio signals at the outputs of the 
mixers 5L and 5R correspond to the left- and right -chan- 
nel audio signals at the receiving terminal. Hence, when 
listening over a stereo reproduction system, the listener 
at the receiving terminal is able to hear the audio sound 
on the selected audio signal channel At (originated from 
the principal speaker) in perspective with its sound im- 
age localized at the center of the reproduction system. 

The audio signals of the selected audio signal chan- 
nels A2 to A N are branched at the branch points 3-2 to 
3-N into left and right channels after being attenuated 
by the attenuators 4-2 to 4-N, lor example, 1/N 1/2 -fold 
(N being the number of selected audio signal channels 
At to A N ) so that the sum of the speech power levels of 
the audio signals on the selected audio signal channels 
Aa to A N , reproduced at each terminal, may be equal to 
or smaller than the level of the reproduced speech of 
the principal speaker. The left -channel audio signals are 
held in-phase by the phase controllers 4-2L to 4-NL and 
applied to the mixer 5L, whereas the right-channel audio 
signals are phase inverted (multiplied by -1) by the 
phase controllers 4-2R to 4-NR and then applied to the 
mixer 5R. 

When presented with sounds in opposite phase 
from the left and right channels in stereo reproduction, 
the listener could not perceive the sound images close 
to his head in perspective. Through utilization of this hu- 
man hearing or auditory characteristic, the subordinate 
audio signals fed to the selected audio signal channels 
A2 to A N are perceived by the listener at each terminal 
without perspective (i.e. without a sense of distance 
about him) when he listens over the stereo reproduction 
system. On the other hand, the sound originated from 
the principal speaker is localized in a fixed position. The 
attenuators 4-1 to 4-N in the level control part 14Aof the 
audio signal processing part 25 shown in Fig. 10 are to 
make the level of the sound originated Irom the principal 
speaker reproduced at each terminal larger than the 



sum of the levels of the sound originated from the other 
speakers. The difference of localization between the 
sound originated from the principal speaker and the 
sound originated from the other speakers is provided 
5 . solely depending on whether the left- and right-channel 
audio signals are controlled to be in-phase or opposite 
phase-of-phase in the phase control part 1 4B 

Fig. 11 illustrates another embodiment of the audio 
signal processing part 25, which is designed so that at 
to each terminal only the sound originated from the princi- 
pal speaker is reproduced by using the left-hand loud- 
speaker, for instance, and mixed sound originated from 
all speakers are reproduced by the right-hand loud- 
speaker with its power level held equal to or lower than 
is the power level of sound originated from the principal 
speaker. In the right channels B 1R to B NR branched by 
the channel branching part 1 3, there are introduced the 
attenuators 4-1 R, 4-2R, .... 4-NR of an attenuation fac- 
tor N 1/2 ; the attenuation of the attenuator 4-1 L in the 
20 branched left channel B 1L for the principal speaker is set 
to zero and an attenuation sufficiently larger than the 
attenuation factor N 1/2 in the right channel, for example, 
an infinite attenuation, is set in each of the attenuators 

4-2L 4-NL of the left channels B 2L to B NL (that is, the 

25 channels are held OFF). Accordingly, only the audio sig- 
nal of the selected audio signal channel A, for the prin- 
cipal speaker is applied to the left-channel mixer 5L with- 
out being attenuated and the signals of all the selected 
audio signal channels A, to A N are applied to the right- 
30 channel mixer 5R after being attenuated by the attenu- 
ators 4-1 R to 4-NR to an appropriate volume of, say, 
1/N 1 ' 2 . 

The listener at each receiving terminal listens to the 
sounds from the principal speaker and other speakers 
35 at the same time, localizing at different positions. Hence 
it is possible to realize a multi-point teleconference 
wherein the listener can clearly hear the sound originat- 
ed from the principal speaker at all times as well as the 
sounds originated from other speakers. 
40 Fig. 12 shows, by way of example, the principal 
speaker determination scheme in the signal processing 
control part 20 and the scheme of generating left- and 
right-channel mixed audio signals in the audio signal 
processing part 25. The principal speaker determining 
45 scheme shown in Fig. 12 is one that 

"of terminals judged to be in utterance at a certain 
time, the terminal first recognized to be in utterance is 
regarded as that of the principal speaker terminal as 
long as the speaking state lasts." 
so When only one terminal is in the utterance, this terminal 
is decided to be the principal speaker, whereas when a 
plurality cf terminals simultaneously judged to be in ut- 
terance, that one of the terminals which became in ut- 
terance earlier than the others is judged to be thai of 
55 principal speaker at the point when the principal speaker 
until then became out of utterance or in silence. In Fig. 
1 2 Rows A to D respectively show the utterance periods 
NA. N3. NC and ND of the speakers at the terminals 
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TM-1 to TM-4 as diagonally shaded areas. Row E shows 
the originations of left-channel utterance and Row F the 
originations of right-channel utterance. The sound is 
produced by mixing together the audio signals originat- 
ed from those participants other than the principal 
speaker. 

Fig. 13 shows other examples of the principal 
speaker determining scheme in the signal processing 
control part 20 and the scheme of generating left- and 
right-channel mixed audio signals by the audio signal 
processing part 25 in Fig. 11. In the Fig. 13 example a 
particular terminal (TM-1 in this case) is given prefer- 
ence on the right to speak. This control scheme corre- 
sponds to the prevailing custom of giving the chairper- 
son or lecturer preference on the right to speak. In Fig. 
1 3 Rows A to D show the components of the sound, NA 
to ND, at the terminals TM-1 to TM-4, Row E the con- 
tents of the left-channel audio signals and Row F how 
the right-channel audio signals are mixed. 

There are cases where the two identification 
schemes described previously with respect to Fig. 9 are 
also appropriate or inappropriate according to the type 
of Ihe conference. For example, when participants at 
multiple terminals conduct free discussion on equal 
terms, the first identification scheme is preferable in 
which the principal speaker is expected to be changed 
quickly. When participants speak by turns, the second 
identification scheme is favorable in which an undesired 
change of principal speakers is expected to occur. 

Accordingly, it is effective to provide the signal 
processing control part 20 with means that has algo- 
rithms for detection of principal speaker such as exem- 
plified in Figs. 9, 1 2 and 1 3 and switches the control al- 
gorithms through manipulation from the terminals TM-1 
to TM-M as the conference proceeds. 

The speaker selecting part 24 is provided when the 
sound image control parameters are set in the audio sig- 
nal processing part 25 described later in respect of Fig. 
10. In the case where in the audio signal processing part 
25 of Fig. 1 0 the audio signals of any pair of left and right 
branched channels B JL and B JR can be set to be either 
in-phase or 180 degrees out-of-phase with each other 
and the attenuation factor can be set at 1 to 1/N 1/2 for 
any speech selected channel A Jf the speaker selecting 
part 24 is dispensable with setting the sound image con- 
trol parameters for the input channel of the audio signal 
originated from the principal speaker and for the other 
channels in the parameter setting part 14C in the same 
relation as that between the parameter for the principal 
speaker's channel (selected audio signal channel A^ 
and the parameters for the other selected audio signal 
channels A 2 to A N in Fig. 10. Similarly, when the atten- 
uation factor for each branched channel can selectively 
set to any of 0, 1/N 1/2 and 1/°° in the parameter setting 
part 14C, the speaker selecting part 24 is unnecessary. 

Figs. 10 and 11 show the case where the phases of 
the audio signals between the left and right branched 
channels are controlled so that the sound image pro- 



duced by the audio signal originated from the principal 
speaker can be distinguished from the sound images 
produced by the audio signals originated from other 
speakers and where the distribution of the audio signals 
s to the left and right channels for the same purpose. In 
these cases, the reproducing parts 53L and 53R used 
at each terminal are loudspeakers placed in front of the 
listener on the left and right. As the sound image control 
parameter in the signal processing parts 4-1 L to 4-NL 
10 and 4-1 R to 4-NR, the phase or attenuation can be re- 
placed with left and right acoustic transfer functions de- 
scribed later herein with respect to Fig. 18. Such an in- 
stance is subject to headsets to be used as the repro- 
ducing parts 53L and 53R at all terminals. 

SECOND EMBODIMENT 

In Fig. 14 there is illustrated a modified form of the 
Fig. 8 embodiment in which the user at each terminal is 
allowed to participate in multiple conferences. In the Fig. 
14 embodiment, Q audio signal processing parts 25-1 
to 25-P are provided corresponding to Q conferences 
and terminal-associated mixing control parts 21-1 to 
21 -M are provided corresponding to the terminals TM- 
1 toTM-N, respectively, which allow to hold multiple con- 
ferences and enable each terminal to take part in two or 
more of the conferences. 

The conference participant instructs the speaker 
selecting part 24 of the audiocommunication control unit 
100 to select one or more conferences which he wants 
to attend. When designating multiple conferences, the 
participant is required to specify one principal confer- 
ence to which his sound is mixed. Therefore, only one 
conference is determined for which the audio signal 
from that terminal is processed for mixing. As for the oth- 
er designated conferences, the audio signal from that 
terminal is not mixed and the participant only listens to 
mixed sounds from other participants in those confer- 
ences. 

Supposing that one of the logical conferences is a 
dialog among two or more specific members of the con- 
ference as shown in Fig. 14, the participant at one ter- 
minal can talk with a particular member while at the 
same time listening to sound of other participants in the 
conference- this enables all the conference participants 
to have natural dialog as if they are physically sitting in 
the same conference room. In the Fig. 14 embodiment, 
the speaker selecting part 24 has its internal structure 
logically divided corresponding to multiple conferences 
(1 to Q) and performs, for each conference room, the 
same speaker detection as described previously with re- 
spect to Fig. 9. 

The Fig. 1 4 embodiment structurally differs from the 
Fig. 8 embodiment in that the audio signal processing 
parts 25-1 to 25-Q of the same number as that Q of con- 
ferences capable of being held are provided: and one 
of the audio signal processing parts 25-1 to 25-Q is as- 
signed to one logical conference and the terminal-asso- 
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ciated mixing control parts 21-1 to 21 -M are each pro- 
vided at the output side of one of the audio signal 
processing parts 25-1 to 25-P. The audio signal process- 
ing part 25-1 to 25-P in this embodiment may be those 
depicted in Fig. 10 or 11 , tor instance. 5 

Fig. 1 5 schematically illustrates, by way of example, 
the configuration of that one 21 -J of the terminal-asso- 
ciate d mixing control parts 21-1 to 21-N which corre- 
sponds to the terminal TM-J in the Fig. 14 embodiment. 
The terminal-associated mixing control part 21 -J is com- 10 
posed of: conference selecting switches 7S-1 to 7S-Q 
which are supplied with left- and right-channel audio sig- 
nals from the Q signal processing control parts 25-1 to 
25-Q; a left-channel mixer 2-L connected to left-channel 
outputs of all the conference selecting switches 7S-1 to is 
7S-Q; and a right-channel mixer 2-R connected to right- 
channel outputs of the conference selecting switches 
7S-1 to 7S-Q. In response to a participating conference 
designating control signal received from the terminal 
TM-J, the signal processing control part 20 turns ON one 20 
or more conference selecting switches (I^P^Q) corre- 
sponding to the designated conferences, thus selecting 
the designated conferences. 

The left and right audio signal outputs from the au- 
dio signal processing parts 25-1 to 25-Q corresponding 25 
to the conferences 1 to P are branched by the terminal- 
associated branching part 16 and provided to the con- 
ference selecting switches 7S1 to 7S-Q in the terminal- 
associated mixing control part 21 -J. Asa result, left- and 
right-channel audio signals from one or more conf erenc- 30 
es designated by the terminal TM-J are selected and fed 
to the left- and right-channel mixers 2-L and 2-R. For 
example, when the terminal participates in two confer- 
ences at the same time, the left-channel audio signals 
from the two conferences are mixed together by the left- 35 
channel mixer 2-L and outputted therefrom as a left- 
channel audio signal, and the right-channel audio sig- 
nals from the two conferences are mixed together by the 
right-channel mixer 5-R and outputted therefrom as a 
right-channel signal. The left- and right-channel audio <*o 
signals thus generated are encoded in the correspond- 
ing audio signal coding part 27-J in Fig. 14 and sent to 
the corresponding to the conference participating termi- 
nal TM-J, where the mixed speech from the two confer- 
ences is reproduced. *s 

Instead of using the conference selecting arrange- 
ment that performs the afore-mentioned terminal-asso- 
ciated conference selection by the conference selecting 
switches 75-1 to 7S-Q in Fig. 15. it is also possible to 
adopt an arrangement in which the terminal-associated 50 
branching part 16 is formed by a switch matrix logically 
having 2Q by (2QXN) inputs/outputs and ON-OFF con- 
trol of its contacts is made by the signal processing con- 
trol part 20 on the basis of a conference selecting com- 
mand from the terminal to supply the terminal-associat- 
ed mixing control parts 21-1 to 21 -N with cnly the audio 
signal of the conference designated by the terminal. 



THIRD EMBODIMENT 

In the first embodiment shown in Figs. 8 and 11, the 
principal speaker is judged and the audio signal origi- 
nated from him is assigned to the left channel and other 
participants' audio signals are mixed and assigned to 
the right channel. The audio signals from the left and 
right channels are sent to each participating terminal, 
where the sound is reproduced using one sound source 
for each channel. When this system is applied to com- 
munications among three or more terminals, audio sig- 
nals from two points are simultaneously mixed in the 
right channel, in which case the listener cannot localize 
their sounds at different positions. Additionally, when the 
principal speaker changes, the audio signal from each 
terminal is not always distributed to the same channel, 
which causes that the listener does not localize each 
sound component of the same speaker at the same po- 
sition at all times. This hinders the identification of each 
speaker and intelligibility. Fig. 16 illustrates an embodi- 
ment intended to overcome this defect. 

The Fig. 1 6 embodiment is also based on the basic 
configuration of the present invention depicted in Fig. 9. 
The audio communication control unrt 100 of Fig. 16 
processes audio signals of N participants by using dif- 
ferent sets of acoustic transfer functions as the sound 
image control parameters so that the reproduced 
sounds of the N participants are localized at different 
spatial positions. This permits implementation of a tel- 
econference that simultaneously joins a maximum of N 
terminals where the sounds of the speakers are intend- 
ed to be localized at different positions. In this instance, 
however, the terminal requires to use headsets as the 
reproducing parts' 53L and 53R (Fig. 7). The terminal at 
each point transmits an audio signal of one communi- 
cation line to the audio communication control unit 100, 
which, in turn, transmits an audio signal of one commu- 
nication line back to the terminal at each point. The au- 
dio signal conveyed by one communication channel 
from the audio communication control unit 100 is ob- 
tained by multiplexing stereo audio signals of two chan- 
nels into a one-channel signal 

In this embodiment, the sets of channel branch 
points 3-1, .... 3-N of the channel branching part 16 and 
left and right signal processing parts 4-1 L f 4-1 R, 4-2L, 
4-2R : 4-NL.4-NR of the sound image control part 14, 
which correspond to the respective terminals, are 
shown as sound image processing parts 8-1, 8-2, .... 
8-N, respectively. In Fig. 17 there is illustrated, by way 
of example, the sound image processing part 8-1. 
Based on the principle described previously with respect 
to Fig. 2, the sound image processing part 3-1 con- 
volves, by convolvers 4-1 L and 4-1 R. acoustic transfer 
functions H 1L and H 1R into left and right audio signals 
branched at the channel branch point 3-1 , respectively. 
The audio signals resulting from the convolution are ap- 
plied as left- and right-channel audio signals to the mix- 
ers 5L and 5R of the mixing part 15 in Fig 16. Thetrans- 
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fer functions H 1L and H nR: which are convolved with the 
branched audio signals of the respective channels, can 
be determined corresponding to the spatial positions de- 
sired to localize reproduced sounds of the audio signals. 

The switching part 1 1 selects J (where 1 <J*M) com- 
munication lines from an unspecified number of commu- 
nication lines 40 forming a communication circuit net- 
work, where M represents the number of terminals si- 
multaneously connected to the network and usually 
M<N. Every selected communication line is connected 
as two channels for each one of terminals that simulta- 
neously conduct audio communication. One of the two 
channels carries the input audio signal in this example 
and is connected to a decoding part 23-J (where J=1, 
2, .... N). The other channel carries the output audio sig- 
nal and is connected to a multiplexing part 22-J via the 
input channel C,. Each decoding part 23-J decodes the 
audio signal inputted thereto from the terminal connect- 
ed thereto. The audio signal decoded in the decoding 
part 23-J is applied to an amplification factor setting part 
35 and an amplifier 36-J. 

The signal processing control part 20 receives a 
connection confirm signal and similar control signals 
that are transmitted from the respective terminals via the 
switching part 11 . The signal processing control part 20 
detects the number M of connected terminals from such 
control signals and sends the detected number M of 
connected terminals to the amplification factor setting 
part 35 and the parameter setting part 14C. The ampli- 
fier 36-J amplifies the input audio signal with an ampli- 
fication factor G j, which is determined in the amplifica- 
tion factor setting part 35. For example, the amplification 
factor Gj is determined such that the integrated power 
IT of the audio signal from the amplifier 36-J is equal for 
any channels. 

The parameter setting part 14C sets acoustic trans- 
fer function H JL (Gj) and H JR (9j) necessary for the sound 
image processing part 8-J to synthesize an audio signal 
for localization of the reproduced sound originated from 
the terminal TM-J of each point J at a different target 
position 6j. The target positions Bj and the acoustic 
transfer functions HjJOj) and H JR (9j) have a one-to- 
one correspondence; hence, once the target position 9j 
is determined for each input signal, the acoustic transfer 
functions H JL (ej) and H JR (ej) can be determined which 
are convolved with each audio signal. In this example, 
the target positions Gj for the audio signals from respec- 
tive terminals are determined on the basis of the number 
M of connected terminals. As exemplified in Fig. 18 
wherein M=5, the target positions Gj are determined at 
equiangular intervals of A9=180/(M-1) degrees about 
the listener over at angular positions (+90 o )-(0 o )-{-90°) 
from his left to right side in a horizontal plane. The target 
positions Gj for the terminals TM-J at the respective 
points J are determined by 90-130(J-1)/(M-1) degrees 
according to the number M of connected terminals. 
Therefore, the target position spacing A9 is minimum in 
the case of using the maximum number N of connecta- 



ble points (M=N). 

In the sound image processing part 8-J, as de- 
scribed previously with respect to Fig. 17. the transfer 
functions H JL (9j) and H JR (6j) set by the parameter set- 
5 ting part 14C are convolved with the audio signal from 
the amplifier 36-J, and the convolved outputs are ap- 
plied as left- and right-channel audio signals to the mix- 
ers 5L and 5R, respectively. In the case of binaural lis- 
tening to these left- and right-channel audio signals a 
w headset, the listener can localize the sound image at the 
target position 9j. The left- and right-channel audio sig- 
nals from the sound image processing part 8-J are also 
provided to delay parts D-JL and D-JR. respectively. 
The mixer 5L mixes together all th e left-channel au- 
TS dio signals fed from the sound image processing parts 
8-1 to8-N and applies the resulting left-channel mixed 
audio signal to a branch point 6L in the branching part 
16. The mixer 5R mixes together all the right-channel 
audio signals fed from the sound image processing 
20 parts B-1 to 8-N and applies the resulting right-channel 
mixed audio signal to a branch point 6R. The branch 
point 6L branches the left-channel mixed audio signal 
fed from the mixer 5L to N cancelers 26-1 L to 26-NL 
The branch point 6R branches the right-channel mixed 
25 audio signal fed from the mixer 5R to N cancelers 26-1 R 
to 26-NR. 

On the other hand the left-channel audio signal ap- 
plied to each delay part D-JL is delayed for a time t JL 
and provided to the canceler 26-JL The delay t jl is set 
30 to the sum of the delay by the audio signal processing 
in the mixer 5L and the delay by the audio signal 
processing at the branch point 6L. Consequently, the 
left-channel audio signal outputted from the delay part 
D-JL and that component of the left-channel mixed au- 
3S dio signal outputted from the branch point 6L which- was 
applied to the mixer 5L from the sound image process- 
ing part 8-J become in-phase and they are canceled 
each other in the canceler 26-JL. Accordingly, the audio 
signal component received from the terminal TM-J at 
40 each point J is eliminated from the left-channel mixed 
audio signal to be branched to the terminal TM-J and 
hence an echo can be prevented. Accordingly, the audio 
signal that is sent back to the terminal TM-J via the can- 
celer 26-JL is only a mixed version of audio signals from 
45 the terminals other than TM-J. For the same reason as 
given above, the delay part D-JR delays the right-chan- 
nel audio signal from the sound processing part 6-J for 
a time t JR and then applies it to the canceler 26-JR. The 
delay t JR is set to the sum of the delay by the audio 
50 signal processing in the mixer 5R and the delay by the 
audio signal processing at the branch 6R. 

The echo-suppressed left- and right-channel audio 
signals outputted from the cancelers 26-JL and 26-JR 
are provided to the multiplexing part 22-J, wherein they 
55 are multiplexed and encoded, thereafter being sent via 
the switching part 11 to the terminals TM-J at the points 
j in this way ; each multiplexing part 22-J multiplexes 
audio signals of the left and right channels into one- 
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channel audio signal and encodes it. As a result, the 
multiplexed one-channel audio signal is encoded and 
then transmitted via the switching part 11 to the points 
J (1<J<M) over one communication line. Thus the delay 
difference between the communication lines by the use 
of two lines for the transmission of two-channel stereo 
signals can be avoided, besides the number of commu- 
nication lines used can be saved. By decoding the mul- 
tiplexed audio signal and reproducing the sounds at 
each terminal, the listener at that terminal can localize 
the sounds from other terminals at desired target posi- 
tions fij. This enables each listener to easily identify the 
other speakers and ensures high speech intelligibility. 
Additionally, no sound image position processing 
means is needed for the sound localization at each point 
and an economical system can be implemented. 

Incidentally, the Fig. 1 6 embodiment is based on the 
assumption that two communication lines are used for 
the transmission of the two-channel stereo audio signals 
to each of the points J (1<J<M) from the audio commu- 
nication control unit 100. In such an instance, one com- 
munication line is used for each of the left- and right- 
channel audio signals and the switching part 11 needs 
to perform three-switching for each point J. Further, the 
multiplexing and demultiplexing in the multiplexing part 
22-J and at each point in Fig 16 become unnecessary, 
but two coding parts 22-JL and 22-JR are needed for 
each terminal as a substitute for one multiplexing part 
22-J. It is also necessary to connect the cancelers 26-JL 
and 26-JR to the inputs of the coding parts 22-JL and 
22-JR for inputting thereinto audio signals. 

As described above, according to the embodiment 
of Fig 16, the listener at each terminal can localize 
sounds from other terminals at different position and 
hence can easily listen to them even if the terminal is 
not provided with the audio signal processing part in- 
tended for sound localization. Thus the listener at each 
point can easily identify the speaker and excellent intel- 
ligibility can be obtained. Moreover, there is no need of 
predetermining the communication system. 

As referred to above, in the binaural listening over 
a headset or the like, too, it is possible to implement an 
economical multi-point audio telecommunication sys- 
tem in which the listener localizes sound from each 
speaker at a different position. When the number M of 
connected terminals is smaller than the maximum 
number N of connectable terminals, spacing of target 
positions can be increased accordingly. 

FOURTH EMBODIMENT 

Now, consider the case where terminals TM-1 to 
TM-6 at different points communicate with one another 
via the audio communication control unit as shown in 
Fig. 1 9. Let it be assumed that combinations of terminals 
TM-1 to TM-3 and TM-3 to TM-6 form teleconferences 
X and Y, respectively. In this case, users of the terminals 
TM-4 to TM-6 cannot listen to sounds of users at the 



terminals TM-1 and TM-2, whereas the users at the ter- 
minals TM-1 and TM-2 cannot listen to sounds of the 
users at the terminals TM-4 to TM-6. The user of the 
terminal TM-3 can listen to sound of the user at any of 

5 the terminals TM-1 , TM-2 and TM-4 to TM-6, and all the 
users at the terminals TM-1 , TM-2 and TM-4 to TM-6 
can listen to sound of the user at the terminal TM-3. With 
this method, the contents of communication can be con- 
cealed from users who do not belong to the teleconfer- 

10 nece concerned or the user belonging to multiple tel- 
econferences can be made to recognize the contents of 
communication in any one of the conferences, and var- 
ious other applications are feasible. Besides, by listen- 
ing to sounds of individual speakers while localizing their 

f5 sound images at different positions, the listener can eas- 
ily identify the speakers and understand the contents 
with high intelligibility; furthermore, it can be expected 
that the listener and the speakers develop better com- 
munication with each other as if they are in the same 

20 space. 

With the audio communication control unit shown in 
Figs. 14 and 15, however, the user at the terminal TM- 
3 can select the both teleconferences X and Y and si- 
multaneously listen to sounds in the both conferences 

2$ but he is allowed to speak in only his selected one of the 
conferences X and Y. When the user at the terminal TM- 
3 listens to sounds from the both conferences X and Y 
at the same time, the sounds received from the confer- 
ences are separately reproduced from left and right 

30 loudspeakers but sounds from multiple terminals in the 
sounds from the teleconference X or Y cannot be local- 
ized at different positions. 

In Fig. 20 there is illustrated the basic configuration 
of a fourth embodiment of the audio communication con- 

35 trol unit of the present invention intended to overcome 
the disadvantage mentioned above. The main arrange- 
ment of the audio communication control unit of this em- 
bodiment can be formed by: the switching part 11; the 
sound image processing parts 8-J (J=1 ,2 N, where 

40 N=6 in this example), each of which performs speech 
processing for localizing the position of the speaker's 
sound source by convolving transfer functions from the 
sound source to listener's both ears with the audio signal 
sent from one of the terminals TM-1 to TM-6; a combi- 

45 nation assignment part 19 for assigning combinations 
of terminals in correspondence with multiple teleconfer- 
ences; an mixing/branching parts 17-P (P_1, 2, .... Q, 
where Q=2 in this example); and an mixing part 1 2. Each 
mixing/branching part 17-P is composed of the left- and 

so right-channel mixers 5L and 5R and the branch points 
6L and 5R. The mixing part 12 comprises N left- and N 

right-channel mixers 2-JL and 2-JR (J=1, 2 where 

N=6 in this example). The components of the same kind 
are specified by suffixes J (1<J<N) and P (1<P<Q). The 

ss components for processing the audio signals of the left 
and right channels are similarly identified by suffixes L 
and R, respectively. 

The operation of the audio communication control 
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unit according to this embodiment will be described. The 
switching part 11 selects a communication line J 
(1<J<M) from among an unspecified number of lines 
forming the circuit network, M representing the number 
of terminals connected to the network at the same time. 
Usually M<N. where N represents the maximum 
number of connectable terminals. In response to a com- 
munication start/end, terminal designate, connection 
confirmation or similar control signal received from one 
terminal, for instance, the switching part 11 selects the 
communication line J and couples it to the sound image 
processing part 8-J via the input channel Cj in this ex- 
ample. The sound image processing part 8-J is identical 
in construction to that depicted in Fig. 17 and corre- 
sponds to a set of one branch point 3-J and left- and 
right-channel signal processing parts 4-JL and 4-JR in 
Fig. 6. 

The sound image processing part 8-J performs 
processing for localization of sound originated from the 
terminal TM-J at a target position by convolving the 
transfer function with the audio signal sent from the ter- 
minal TM-J. Hence the audio signal that is outputted 
from the sound image processing part 8-J is a stereo 
audio signal. The stereo audio signals generated by the 
respective sound image processing parts 8-J are ap- 
plied to the combination assignment part 19, wherein 
they are sorted for each combination of terminals. In the 
illustrated example, the terminals TM-1 to TM-3and TM- 
3 to TM-6 are shown to belong to the teleconferences 
Xand Y, respectively, as depicted in Fig. 19. 

The stereo audio signals, classified by the combi- 
nation assignment part 19 into those belonging to the 
conference X and those belonging to the conference Y, 
are fed to the mixing/branching parts 1 7-1 and 1 7-2 : re- 
spectively wherein the audio signals from the terminals 
belonging to the same conference are mixed by the mix- 
er 5L or 5R for each of the left and right channels. The 
mixed audio signals of the teleconference X (and the 
teleconference Y) are distributed by the left and right 
branch points 6L and 6R in the mixing/branching part 
17-1 to left- and right-channel mixers 2-1 L to 2-3L and 
2-1 R to 2-3R in the mixing part 1 2 that correspond to all 
the terminals TM-1 to TM-3 belonging to the same tel- 
econference X. Similarly, the mixed audio signals of the 
teleconference Y are distributed by the left- and right- 
channel branch points 6L and 6R in the mixing /branch- 
ing part 17-2 to left- and right-channel mixers 2-3L to 
2-6L and 2-3L to 2-6R in the mixing/branching part 17-2 
that correspond to all the terminals TM-3 to TM-6 be- 
longing to the same teleconference Y Each pair of mix- 
ers 2-JL and 2-JR mixes : for each channel, all audio sig- 
nals of the teleconference to which the pair belongs, 
thereby generating stereo audio signals The stereo au- 
dio signals thus obtained are transmitted via the switch- 
ing part 1 1 to those of the terminals TM-1 to TM-6 which 
correspond to the teleconference. 

In this example, the audio signals that are sent to 
the terminals TM-1 to TM-6 are stereo audio signals and 



the user of each terminal can listen to sounds originated 
from the other terminals, localizing them at target posi- 
tions determined by the transfer functions convolved 
with the audio signals in the sound image processing 

5 parts 8-J as shown in Figs. 21 A and 2lB. That is, since 
in the embodiment of Fig. 20 the audio signals originated 
from the terminals TM-1 to TM-3 are convolved with the 
transfer functions, mixed with one another for each 
channel and then transmitted to the terminals TM-1 to 

10 TM-3, only the users at these terminals can listen to the 
sounds originated from the terminals TM-1 to TM-3, lo- 
calizing their sounds at target positions corresponding 
to the transfer functions convolved with the audio sig- 
nals in the sound image processing parts as shown in 

'5 Fig. 21 A. The teleconference formed by this combina- 
tion of terminals will hereinafter be identified as a tel- 
econference X. Likewise, transfer functions are con- 
volved with the audio signals originated from the termi- 
nals TM-3 to TM-6 and the audio signals are mixed with 

20 one another for each channel and sent to the terminals 
TM-3 to TM-6; hence, the listeners of these terminals 
can listen to the sounds originated from the terminals 
TM-3 to TM-6, localizing their sounds at target positions 
corresponding to the transfer functions convolved with 

25 the audio signals in the sound image processing parts 
as shown in Fig. 21 B. The teleconference formed by this 
combination of terminals will hereinafter be identified as 
a teleconference Y In this instance, since the user of 
the terminal TM-3 belongs to both of the teleconferenc- 

^0 es X and Y he can listen to sounds originated from all 
the terminals TM-1 to TM-6 of the both teleconferences 
X and Y, localizing their sounds at different target posi- 
tions as depicted in Fig. 21C. 
v 

35 FIFTH EMBODIMENT 

A description will be given, with reference to Figs. 
22 and 23, of a concrete example of the audio commu- 
nication control unit 100 of the basic configuration 

40 shown in Fig. 20. Suppose that the audio communica- 
tion control unit 100 of this embodiment transmits audio 
signals between it and each terminals over a pair of 
down- and up-link communication lines. The audio com- 
munication control unit 100 of this embodiment controls 

45 a maximum of Q teleconferences formed by a maximum 
of N terminals. Each terminal sends a one-channel dig- 
ital audio signal to the audio communication control unit 
100, which, in turn, sends a one-channel digital audio 
signal to each terminal. The one-channel audio signal 

50 from the audio communication control unit 1 00 is a one- 
channel muftiplexed version of stereo signals of two 
channels generated in the unit 100. 

Since in Fig 20 the switching part 11 : the decoding 
part 23-J (J=1, 2, .... N), the signal processing part 20, 

55 the amplification factor setting part 35. the amplifier 6-J 

(J=1 N), the parameter setting pan 14C. the sound 

image processing part 8-J (J=i , .... N) ard the multiplex- 
ing/coding pari 22-J (J=i, N) are identical in con- 
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struction and operation to those in Fig. 16 embodiment, 
no description will be repeated. This embodiment differs 
from the Fig. 1 6 embodiment in the provision of the com- 
bination assignment part 19, a conference participating 
terminal selecting part 9C, an mixing/b ranching part 
1 7-P (P=1 , . , Q), a conference selecting part 7C, a con- 
ference selecting switch 7-P (P=1 Q) and an mixing 

part 12. The combination assignment part 19 has Q by 

N terminal selecting switches 9P-J (P=1 Q and 

j=1. .... N), and the mixing part 12 has N pairs of mixers 
2-JL and 2-JR (J=1 , N). 

As depicted in Fig. 23, the mixing/branching part 
17-P is made up of mixers 5L and SR. branch points 6L 
and 6R, delay parts D-JL and D-JR (J=1 . N) and can- 
celers 26-JL and 26-JR (J=1 ..... N). The functions of the 
parts characteristic of this embodiment will be described 
below. As described previously with respect to Fig. 16. 
the audio signal from each terminal is provided to the 
sound image processing part 6-J via the decoding part 
32- J and the amplifier 36- J. 

The signal processing control part 20 receives from 
each terminal via the switching part 11 control signals 
as to communication start/end, connection confirma- 
tion, the conference membership of the terminal and so 
forth. Based on these control signals, the signal 
processing control part 20 detects the number M of con- 
nected terminals TM-1 to TM-n, their communication 
start/end and their conference membership. The signal 
processing control part 20 sends information of the con- 
nected terminals and the number M of connected termi- 
nals to the amplification factor setting part 35 and the 
parameter setting part HC, sends the communication 
start/end information to the conference participating ter- 
minal selecting part 9C and the conference selecting 
part 7C, and sends information of the conference mem- 
bership of each of the connected terminals TM-1 to TM- 
n to the conference participating terminal selecting part 
9C 

The parameter setting part 14C sets acoustic trans- 
fer functions H L (9j) and H R (6j) necessary for the sound 
image processing part 8-J to perform processing tor 
generating an audio signal whose reproduced sound 
originated from each terminal TM-J is localized at a dif- 
ferent target position 9j. Since the target positions 0j 
and the acoustic transfer functions H L (9j) and H R (0j) 
have a one-to-one correspondence, the acoustic trans- 
fer lunctions can be set once the target positions are 
determined. In this embodiment, based on the number 
M of connected terminals, the target positions 6j for the 
sounds originated from the respective terminals are de- 
termined. As shown in Fig. 21 C, the target positions 0j 
are determined at equiangular intervals of 1 80/{M-1 ) de- 
grees about the listener over at angular positions 
(+90°)-(0 (, )-(-90 o ) from his left- to right side in a horizon- 
tal plane. That is. the target positions 9j for the points J 
are determined by 90-i80(J-1)/(M-1) degrees. 

Since M=6 in the example of Fig. 21 C. the target 
positions 6j for the terminals TM-1 to TM-6 are as fol- 



lows: 



ftj_ t =90°-l8O ,, X(1-1)/(6-1) = +90 o 
9 J=2 = 90° -1 80° X (2-1 )/<6-1 ) = +54° 
9^3 = 90°-l80°X(3-1)/(6-1) = +18° 
8 J=4 = 90°-180°X(4-1)/(6-1) = -18° 
6 J=S = 90°-l80°X(5-1)/(6-1) = -54° 



6 J=6 = 90°-180°X(6-1)/(6-1) = -90° 

The sound image processing part B-J convolves the 
is acoustic transler functions H L (6j) and H R (8j), set in the 
parameter setting part 14C tor the terminal TM-J, with 
the audio signal ted from the amplifier 36-J, generating 
left- and right-channel stereo audio signals. Listening to 
sounds reproduced from the stereo audio signals bin- 
20 aurally, the listener localizes the sound image at the tar- 
get position fij. 

The left- and right-channel audio signals from the 
sound image processing part 80J are distributed to the 
Q terminal selecting switches 9 r J, 9 2 -J .... 9 Q -J. Based 
25 on the control information about communication start/ 
end and the communication conference membership of 
each connected terminal instructed by the signal 
processing control part 20, the conference participating 
terminal selecting part 9C determines and sends termi- 
30 nal selecting information to the terminal selecting switch 
9 P -J. For instance, upon opening or closure of the tel- 
econference P to which the terminal TM-J belongs, the 
conference participating terminal selecting part 9C 
transfers a control signal to the terminal selecting switch 
35 9 P -J to permit or inhibit the passage therethrough of au- 
dio signals. The terminal selecting switch 9 P -J responds 
to the control signal to permit or inhibit the passage 
therethrough of audio signals. As the result of this, for 
example., in the combination of terminals shown in Fig. 
40 21 C, the audio signals originating from the terminals 
TM-1 to TM-3 are assigned to the mixingTbranching part 
17-1 and the audio signals from the terminals TM-3 to 
TM-6 are assigned to the mixing/branching part 17-2. 
Turning now to Fig. 23, the internal construction of 
45 the mixing/branching part 17-P will be described. The 
left- and right-channel audio signals led from each ter- 
minal selecting switch 9 P -J are applied to the mixers 5L 
and 5R, respectively, and at the same time, they are pro- 
vided to delay parts D-JL and D-JR as well. The mixer 
so 5L mixes together N inputted left-channel audio signals 
and outputs the mixed left -channel audio signal to the 
branch point 6L. The mixer 5R similarly mixes together 
N inputted rightnchannel audio signals and outputs the 
mixed right-channel audio signal to the branch point 6R. 
55 The branch point 6L branches the mixed left-channel 
audio signal inputted thereto to N cancelers 26-JL 
(J=1 . .... N) Likewise, the branch point 6R branches the 
mixed right-channel audio signal inputted thereto to N 



14 



27 



EP 0 730 365 A2 



28 



cancelers 26-JR (J=1 N). 

The delay part D-JL delays for a time t jl the lett- 
channel audio signal fed from the terminal selecting 
switch 9 P -J and applies the delayed left-channel audio 
signal to the canceler 26-JL. The delay x JL is selected 
to be the sum of the delay by the audio signal processing 
in the mixer 4L and the delay by the audio signal 
processing in the branch point 6L In consequence, the 
left-channel audio signal outputted from the delay part 
D-JL and that left-channel audio signal component in the 
left-channel mixed audio signal from the branch point 
6L that was outputted from the terminal selecting switch 
9 P -J are synchronized with each other The delay t jr of 
the delay part D-JR is also similarly determined, and the 
right-channel audio signal outputted from the delay part 
D-JR and that right-channel audio signal component in 
the right-channel mixed audio signal from the branch 
point 6R which was outputted from the terminal select- 
ing switch 9P-L are synchronized with each other. 

The cancelers 26-JL and 26-JR cancel the delayed 
audio signals fed from the delay parts D-JL and D-JR 
from the audio signals fed from the branch points 6L and 
6R respectively As the result of this, the above-men- 
tioned components are cancelled each other, and in the 
channel corresponding to each terminal TM-J, a mixed 
audio signal originating from other channels K (J*K) are 
obtained. This mixed audio signal is applied to the con- 
ference selecting switch 7-P. That is, the audio signal 
originating from each terminal TM-J is excluded from the 
audio signal that is transmitted to that terminal TM-J. 
Thus an echo can be cancelled which is attributed to the 
audio communication control unit 100 of this embodi- 
ment. 

Turning back to Fig. 22. the conference selecting 
part 10 determines conference selecting information in 
response to teleconference P opening/closure informa- 
tion instructed by the signal processing control part 20. 
This conference selecting information is transferred to 
the conference selecting switch 7-P. For example, when 
the teleconference P is opened or closed, a control sig- 
nal is transferred to the terminal selecting switch 9 P -J to 
permit or inhibit the passage therethrough of audio sig- 
nals The conference selecting switch 7-P responds to 
the control signal from the conference selecting part 7C 
to permit or inhibit the passage therethrough of the audio 
signal outputs from the mixing/branching part 17-P, i.e. 
from the cancelers 26-JL and 26-JR. 

The inter-combination mixers 2-JL and 2-JR re- 
spectively add the left- and right channels of Q combi- 
nations of terminals Ps selected by the conference se- 
lecting switches 7-P (P=1 Q) from J-th channels of 

the Q mixing /branching parts 17-P (P=1. .... Q) corre- 
sponding to the Q combinations of terminals. The refer- 
ence character P s is the number of the combination of 
terminals (or the conference number) for which audio 
sicnals are mixed together and a maximum of Q com- 
binations can be selected in the range of 0<P s ^Q- The 
corresponding left- and right-channel audio signals of 



the selected combinations of terminals mixed together 
and the mixed audio signals are sent to each terminal 
TM-J, at which the user can listen lo sounds from all the 
other terminals belonging to the selected multiple termi- 
s nal combinations (multiple teleconferences). The audio 
signal originated from the terminal TM-J is sent to all the 
other terminals selecting those terminal combinations 
including the terminal TM-J. Each multiplexing/coding 
part 22-J multiples and encodes the left- and right-chan- 
10 nel audio signals fed from the inter-combination mixing 
parts 2-JL and 2-JR. That is, the multiplexing/coding 
part 22J multiplexes the stereo audio signals corre- 
sponding to left and right channels into one-channel au- 
dio signals and encodes them. As a result, the encoded 
is one-channel multiplexed signals are independently ap- 
plied to the switching part 11 for each terminal TM-J and 
the one-channel multiplexed audio signals are transmit- 
ted to each terminals TM-J (1<J<M) over one commu- 
nication line. 

20 According to the embodiment of Fig. 22 : even when 
two teleconferences are being held by the terminals TM- 
1 to TM-6 as shown in Fig. 21 C, the listeners in the both 
conferences can each localize sounds originated from 
the other terminals at different target positions spaced 
25 36 degrees apart; hence, it is possible to simultaneously 
realize teleconferences X exclusive to the terminals TM- 
1 to TM-3 and a teleconference Y exclusive to the ter- 
minals TM-3 to TM-6. In this case, the listener at the 
terminal TM-3 can listen to sounds originated from the 
30 terminals TM-4 to TM-6 as well as from the terminals 
TM-1 to TM-2. Additionally, even if a teleconference by 
all the terminals TM-1 to TM-6 is in progress, the tel- 
econference X by the terminals TM-1 to TM-3 can be 
implemented. In the above, the operation of the audio 
35 communication control unit 1 00 has been described on 
the assumption that a certain terminal such as TM-3 
takes part in the multiple teleconferences X and Y; in the 
case of one terminal such as TM-1 participates in the 
teleconference X alone in Fig. 21 C, it is sufficient only 
40 to apply audio signals from one conference selecting 
switch corresponding to the conference X to, tor exam- 
ple, the mixing parts 2-1 L and 2-1 R corresponding to 
the terminal TM-1. 

As depicted in Fig. 21 C. even in applications of the 
45 multi-point audio communication to, for instance, a dia- 
log between particular speakers in a general teleconfer- 
ence or monitoring of each teleconference, the listener 
can localize sound originated from each terminal at a 
diflerent target position. This assists identification of 
so each speaker and improves the intelligibility. The advan- 
tage of this embodiment is that the sound image 
processing part tor sound localization need not be intro- 
duced into each terminal TM-J. Thus it is economically 
feasible to implement teleconferencing service that en- 
55 ables all conference participants to develop natural 
communications with the other members as if they were 
in the same space. 

As described above, the embodiment of Fig. 22 has 
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Q mixing/branching parts 17-1 to 17-Q corresponding 
to Q teleconferences, and upon receiving from each ter- 
minal TM-J the control signal designating one or more 
teleconferences in which the user of that terminal in- 
tends to participate (speak), the signal processing con- 
trol part 20 applied the control signal to the conference 
participating terminal selecting part 9C. The conference 
participating terminal selecting part 9C turns ON so that 
one or more of the Q terminal selecting switches 9 P -J 
(P=1. .... Q) for the audio signal originated from the ter- 
minal TM-J are mediated to the mixing/branching part 
17-P corresponding to the teleconferences specified by 
the control signal. Therefore, the audio signal originated 
from the terminal TM-J can be connected to the one or 
more teleconferences designated by the terminal TM-J 
and its user can join the teleconferences. Additionally, 
the Fig. 22 embodiment has Q conference selecting 
switches 7-1 to 7-Q connected to the outputs of the Q 
mixing/branching part 17-1 to 17-Q. Upon receiving 
from each terminal TM-J the control signal designating 
one or more teleconferences to which the user of that 
terminal intends to monitor, the signal processing con- 
trol part 20 passes the control signal to the conference 
selecting part 7C. The conference selecting part 7C re- 
sponds to the control signal to turn OFF the conference 
selecting switches connected to the one or more mixing/ 
branching parts corresponding to the teleconferences 
specified by the control signal, thereby mediating audio 
signals of the designated one or more teleconferences 
to the terminal TM*J. 

Hence, transmission of a control signal to the audio 
communication control unit of the present invention as 
needed, enables the user at each terminal TM-J to 
change, join or leave communications in which he mon- 
itors or participates. 

In the audio communication control unit 100 of Fig. 
22, the two-channel audio signal (or stereo signal) can 
be sent to each terminal TM-J over a two-channel com- 
munication line instead of using a one-channel commu- 
nication line. In this case, one communication line is 
used for the audio signal of each channel and the switch- 
ing part 1 1 is required to switch three lines for each input 
into and output from the terminal TM-J--this avoids mul- 
tiplexing in the multiplexing/coding part 22-J and demul- 
tiplexing at each terminal TM-J. In such an instance, 
however, two coding part 22-J are required for the left 
and right channels, resulting in the construction becom- 
ing complex accordingly. 

SIXTH EMBODIMENT 

Fig. 24 illustrates the basic configuration of a mod- 
ified form of the embodiment shown in Fig 20. The re- 
quirement of practical use is the same as in the embod- 
iments depicted in Figs. 20 and 22. The audio commu- 
nication control unit of this embodiment is identical in 
construction to the Fig. 20 embodiment except that the 
combination assignment part 1 0 is provided at the stage 



preceding the sound image processing pat 8-J. Since 
the combination assignment part 19 is provided at the 
input side of the sound image processing part S-J, the 
audio signal processing for sound localization is carried 
5 out after a combination of connected terminals is deter- 
mined. This allows setting of the target positions of 
sounds originated from the terminals TM-1 to TM-3 or 
TM-3 to TM-6 for the teleconferences X and Y, respec- 
tively, as shown in Figs. 26A and 26B. 

10 

SEVENTH EMBODIMENT 

In Fig. 25 there is illustrated a concrete example of 
the basic configuration shown in Fig. 24, the parts cor- 

15 responding to those in Fig. 22 being identified by the 
same reference numerals. The construction and func- 
tions of this embodiment are mostly similar to those in 
the Fig. 22 embodiment. This embodiment also provides 
multi-point teleconferencing service that enables each 

20 terminal to participate in multiple teleconferences at the 
same time and the user at each terminal to listen to 
sounds originated from the other terminals, localizing 
their sounds at different target positions. Further, this 
embodiment is common to the Fig. 22 embodiment in 

25 that the sound image processing part is dispensable at 
each terminal TM-J or in the combination P of terminals. 
A description will be given of this embodiment, focusing 
on differences between it and the embodiment of Fig. 
22. 

30 The signal processing control part 20 receives, from 
respective terminals via the switching part 11 , such con- 
trol signals as those on communication start/end, con- 
nection confirmation and the membership of the tel- 
econferences P assigned by combinations of connected 

35 terminals TM-J. The signal processing control part 20 
detects according to these control signals the informa- 
tion on the connected terminal TM-J, the number M of 
connected terminals, communication start/end, the 
membership of the teleconferences P and the number 

40 of terminals belonging to each teleconference P. Addi- 
tionally, the signal processing control part 20 sends in- 
formation of the detected terminals and number M of 
connected terminals to the amplification factor setting 
part 35, sends the communication start/end information 

^5 to the conference participating terminal selecting part 
9C and the conference selecting part 7C. sends the 
membership of each connected terminal TM-J in the tel- 
econferences P to the conference participating terminal 
selecting part 9C t and sends the number M P of terminals 

so belonging to each teleconference P to the parameter 
setting part 14C. 

For each combination of terminals P, the parameter 
setting part 14C sets in the sound image processing 
parts 6 -J acoustic transfer functions H L (6 PJ ) and H R (6 P j) 

55 that are convolved with the audio signals originated from 
all terminals TM P -J of the combination P relating to tar- 
get positions 8 PJ . In this embodiment, the target position 
6 P j for sound originated from each terminal TM-J is de- 
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termined on the basis of the number M P of terminals be- 
longing to the teleconference P detected in the signal 
processing control part 20. As exemplified in Fig. 21 C, 
the respective target positions 0 PJ are determined at 
equiangular intervals of 1B0/(M P -1) degrees about the 
listener over at angular positions (+90°)-(0°)-(-90°) from 
his left to right side in a horizontal plane. Letting the 
numbers of terminals TM-J belonging to the teleconfer- 
ence P be J P (1 <J Pl <M P ) in a sequential order, the target 
positions B PJ are determined by 90*180(J P -1)/(Mp-1) 
degrees as described previously. 

The one-channel audio signal originated from each 
terminal TM-J is distributed to Q terminal selecting 
switches 9 P -J (P=1,..., Q). Each terminal selecting 
switch 9 P -J controls passage therethrough of each one- 
channel audio signal in response to a control signal sent 
from the conference participating terminal selecting part 
9C, and the audio signal having passed through the ter- 
minal selecting switch 9 P -J is applied to the correspond- 
ing sound image processing part 8 P *J. In each sound 
image processing part 8 P -J the acoustic transler func- 
tions H L (6 PJ ) and H R (e PJ ) ; set in the parameter setting 
part 14C. are convolved with the audio signal outputted 
from the terminal selecting switch 9 P -J to obtain a two- 
channel audio signal, which is fed to the mix in g/b ranch- 
ing part 17-P. There are provided N (J=1, .... N) sound 
image processing parts 8 P -J for each set P of terminals, 
whereas in Fig. 22 the number of sound image process- 
ing part 8-J is only N. The terminal selecting switch 9 P - 
J in Fig. 22 differs from the counterpart in this embodi- 
ment of Fig. 25 in that the former interrupts the two- 
channel audio signal. The mixing/branching part 17-P 
are exactly identical in construction and operation to that 
in Fig. 23. 

The Fig. 25 embodiment differs from the embodi- 
ment of Fig. 22 in the order of processing audio signals. 
In the embodiment of Fig. 25, one-channel audio signals 
originated from respective terminals TM-J are grouped 
for each combination P (P=1, ... Q) of terminals, after 
which two-channel audio signal for sound localization at 
respective target positions is generated for each tel- 
econference P. Accordingly, the respective terminals 
TM-J belonging to each teleconference P are allowed 
to set different target positions 6 PJ independently for 
each teleconference P. That is, it is possible to set in the 
parameter setting part 14C the acoustic transfer func- 
tions H L (8 PJ ) and H R (6 PJ ) as sound image control pa- 
rameters for enabling the listener to localize sounds 
originated from the terminals TM-J belonging to each 
teleconference P at respective target positions 9 PJ . 

Now. a description will be given of a method how 
the spacing of target positions for sounds originated 
from respective terminals TM-N in each teleconference 
P is increased on the basis ol the number M P ol termi- 
nals belonging to the teleconference P. Consider the ap- 
plication of this method to the combinations of terminals 
shown in Figs. 26A and 26B. Since the teleconference 
X is held among three terminals as shown in Fig. 26A, 



the target positions are spaced 90° apart. The target po- 
sitions for the terminals TM-1, TM-2 and TM-3 are se- 
quentially distributed at angular positions (+90°)-(0°)-(- 
90°) about the listener from his left to right side. Since 

s the teleconference Y is held among four terminals, the 
target positions are spaced 60° apart. The target posi- 
tions for sounds originated from the terminals TM-3, TM- 
4, TM-5 and TM-6 are sequentially distributed at angular 
positions (+90 o H+30°)-(-30 o )-(-90°) about the listener 

10 from his left to right side. 

For comparison, consider the case where the target 
positions are determined using the number M of all con- 
nected terminals. Since the number M of all connected 
terminals is 6, the target positions are spaced 36° apart, 

is and as shown in Fig. 21 C, the target positions for sounds 
originated from the terminals TM-1 to TM-6 are sequen- 
tially distributed at angular positions (+90°)-(+54 9 )- 
{+18°)-<-18°)-(-54 ,> )-(-90 •) about the listener from his 
left to right side. In the embodiments of Figs. 21 A and 

20 21B, since the combination is assigned after the 
processing for sound localization, the target positions 
cannot independently set for each teleconference P; 
hence, the target position for sound originated from one 
terminal is fixed regardless of the combination of termi- 

25 nals. In such an instance, the target position distribution 
for the teleconference X among the terminals TM-1 to 
TM-3 is confined ranging from the right side (+90°) of 
the listener to the front of him on the right (+18°) as de- 
picted in Figs. 21 A and 21 B. In the teleconference Y in- 

30 volving the terminals TM-3 to TM-6, the target positions 
are distributed over the range from the front of the lis- 
tener on the left (+18°) to the front of him on the right (- 
90°). 

As described above, the embodiments of Figs. 24 
35 and 25 allows setting of the target positions for each of 
teleconferences. Additionally, by setting the target posi- 
tions according to the number M P of terminals belonging 
to each combination (i.e. teleconference) P, the angular 
range of the distribution and the spacing of the target 
40 positions for sounds originated from each terminal can 
be wider than in the embodiments of Figs. 20 and 22. 
Consequently, this embodiment allows the listener to 
identify each speaker more easily and to provide further 
improved intelligibility than in the embodiments of Figs. 
45 20 and 22. 

According to the Fig. 25 embodiment; when the 
number of terminals participating in a teleconference 
changes, the target positions of sounds originated from 
the terminals participating can be updated accordingly. 
50 in such an instance, the target positions of sounds orig- 
inated from all the conference participating terminals 
can be determined following a model for arrangement 
(a set of acoustic transfer functions H L (8j) and H R (8j)) 
of the target positions of sounds originated from respec- 
ts tive terminals predetermined by the signal processing 
control part 20 of the audio communication control unit 
100 in accordance with respective numbers of partici- 
pating terminals. That is, when the number M of confer- 
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ence participants changes in response to a request to 
leave or participate in the teleconference, the target po- 
sitions to be assigned to remaining participants are re- 
newed referring to the arrangement model according to 
the updated number of participants and the correspond- 
ing sets of acoustic transfer functions H L (6j) and H R (6j) 
are selected according to the renewed target positions 
and each set in one of the sound image processing parts 
8-J. As an initial procedure of the teleconference, it is 
also possible to predetermine possible target positions 
according to the number of participants and allow the 
participants to customized those positions. 

In the case where the sound image control param- 
eters are set by the parameter setting part 14C in the 
embodiments of Figs. 22 and 25, the target positions 
assignable to conference participants can be estimated 
in the signal processing control part 20 upon detection 
of the number M of terminals participating in each tel- 
econference. While in the above the parameter setting 
part 1 4C has been described to determine which partic- 
ipants are assigned to which estimated target positions, 
customization of currently assigned target position to a 
desired one even while the teleconference is in 
progress. For example, the audio communication unit 
100 presets information about target positions deter- 
mined for all the participants. When a user at a terminal 
changes the target position to a desired one during a 
teleconference, the terminal transmits to the audio com- 
munication control unit 100 a request-to-change signal 
indicating the desired position. In response to the re- 
quest-to-change signal, signal processing control part 
of the audio communication control unit 100 replaces, 
for example, the current position of the requesting ter- 
minal with the desired position and sends the new as- 
signment information to ail the participants. 

In the embodiments of Figs. 8 and 14, it is also pos- 
sible to employ an arrangement in which switches SW- 
1 to SW-N are connected in series to the respective 
channels at the output side of the utterance detection 
processing part 23B as indicated by the broken lines and 
the utterance detection processing part 23B judges ut- 
terance of each channel, and holds the switch SW-J in 
that channel OFF except only during the utternace pe- 
riod, thereby suppressing noise originated from chan- 
nel. Utterance can be judged depending on whether the 
integrated power of the audio signal exceeds the thresh- 
old E 0N as described previously herein with reference 
to Fig. 8. In the embodiments of Figs. 16, 22 and 25, 
too, it is possible to assign switches SW-1 to SW-N in 
series to the outputs of the decoding parts 23-1 to 23-N 
as indicated by the broken lines, judge utterance ol each 
channel according to the audio signal on the channel, 
and hold the channel by the amplification factor setting 
part 35 ON only during its utterance period. 

As described above in detail, the audio communi- 
cation control unit according to the present invention 
branches the audio signal from each terminal to multiple 
channels, mixes the branched audio signals originated 



from the respective terminals to produce a multiple- 
channel mixed audio signal for each branched channel 
and transmits the multiple-channel mixed audio signal 
to each terminals after branching it into respective chan- 

s nels. Hence the sound originated from at least one tel- 
econference participant can be reproduced at each ter- 
minal in distinction from sounds originated from the oth- 
er participants avoiding the requirement of audio signal 
processing at the terminals for sound localization at de- 

10 sired target positions. 

It will be apparent that many modifications and var- 
iations may be effected without departing from the 
scope of the novel concepts of the present invention. 

is 

Claims 

1 . An audio communication control unit for teleconfer- 
encing which is connected via communication net- 

20 work to a plurality of terminals, comprising: 

a switching part for switching audio signals re- 
ceived from N terminals via said communica- 
tion network, N being an integer equal to or 

25 greater than three; 

N input channels connected to said switching 
part and supplied with the input audio signals 
from said N terminals, respectively; 
a channel branching part for branching each of 

30 said input audio signals from said N input chan- 

nels to K branched audio signals of K branched 
channels, K being an integer equal to or greater 
than 2: 

sound image control part for processing said K 

35 branched audio signals of said K branched 

channels corresponding to each of said N input 
channels with corresponding one of N param- 
eter sets each including K sound image control 
parameters of predetermined kind or kinds to 

4 o produce sound-image controled audio signals 

of K branch channels corresponding to esch of 
said N input channels, at least one of said N 
parameter sets being different from the other 
parameter sets according to target position of 

45 said terminals; 

mixing part for mixing said sound-image con- 
trolled audio signals of K branch channels cor- 
responding respectively to said N terminals, for 
each branch channel, to thereby generate 

50 mixed audio signals of K channels: and 

a terminal-associated branching part for 
branching said mixed audio signals of K chan- 
nels in correspondence with said N terminals 
for input into said switching part. 

55 

2. The audio communication control unit of claim 1, 
further comprising: a speaker selecting part provid- 
ed between said N input channels and said channel 
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branching part, 

for selecting input audio signals to be mixed togeth- 
er from said input audio signals inputted into said N 
input channels via said switching part and for out- 
putting said selected input audio signals; N selected 
audio signal channels for applying said selected in- 
put audio signals from said speaker selecting part 
to said channel branching part; and a signal 
processing control part for controlling said speaker 
selecting part so that one of said input audio signals 
to be processing by said at least one of said N pa- 
rameter sets, is outputted to a predetermined one 
of said N selected audio signal channels. 

The audio communication control unit of claim 1, 
comprising signal processing control part for decid- 
ing top-priority one of said N input audio signals as 
an audio signal of a principal speaker and the other 
remaining input audio signals as audio signals of 
other speakers; wherein said branching number K 
is two and wherein: 

said channel branching part branches each of 
said N input audio signal fed thereto from said 
N input channels into first- and second branch 
channel audio signals and outputs them as said 
K branched audio signals; and 
said sound image control part includes a phase 
control part which, under the control of said sig- 
nal processing control part, sets said first- and 
second branch channel audio signals corre- 
sponding to said principal speaker's audio sig- 
nal to be in-phase with each other and sets said 
first- and second-channel audio signals corre- 
sponding to said other speakers' audio signals 
from said other selected to be in opposite plas- 
es to each other. 

The audio communication control unit of claim 2, 
wherein said branching number K is two and where- 
in; 

said signal processing control part decides top- 
priorit y one of said N input audio signals as an 
audio signal of a principal speaker and the other 
remaining input audio signals as audio signals 
of other speakers; 

said speaker selecting part outputs, on the ba- 
sis of the results of decision by said signal 
processing control part, said principal speak- 
er's audio signal and said other speakers' audio 
signals to said predetermined one of said N se- 
lected audio signal channels and the other re- 
maining selected audio signal channels, re- 
spectively, for input to said channel branching 
part; 

said channel branching part branches each of 
said N input audio signals applied thereto and 



outputs first- and second-branch channel audio 
signals as said branched audio signals of said 
K branch channels; and 
said sound image control part includes a phase 

5 control part which, under the control of said sig- 

nal processing control part, sets said first- and 
second branch channel audio signals corre- 
sponding to said principal speaker's audio sig- 
nal from said predetermined one selected au- 

10 dio hannel to be in-phase with each other and 

sets said first- and second branch channel au- 
dio signals corresponding to said other speak- 
ers' audio signals from said other selected au- 
dio channels to be in opposite plases to each 

is other. 

5. The audio communication control unit of claim 3 or 
4 : wherein said sound image control part includes 
an attenuation part which attenuates said other 

20 speakers' audio signals to a level lower than that of 
said principal speaker's audio signal under the con- 
trol of said signal processing part. 

6. The audio communication control unit of claim 1, 
25 wherein said branching number K is two and where- 
in: 

said channel branching part branches each of 
said N input audio signal fed thereto into first 

30 and second branch channel audio signals and 

output them as said K branched audio signals; 
said signal processing control pari decides top- 
priorit y one of said N input audio signals as an 
audio signal of a principal speaker and the other 

35 remaining input audio signals as audio signals 

of other speakers; and 

said signal processing control part sets said N 
parameter sets to said sound image control part 
such that said sound image control part atten- 

40 uates said second branch channel audio signal 

corresponding to said decided principal speak- 
er's audio signal by a first value sufficiently larg- 
er than the attenuation value of said first branch 
channel audio signal corresponding to said de- 

45 cided principal speaker's audio signal and at- 

tenuates said first branch channel audio signals 
corresponding to said other speakers' audio 
signals by a second value sufficiently larger 
than the attenuation value ot said second 

so branch channel audio signals corresponding to 

said other speakers' audio signals. 

7. The audio communication control unit of claim 2, 
wherein said branching number K is two and where- 

ss in: 

said signal processing control part decides top- 
priorit y one of said N input audio signals from 
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said N input channels as an audio signal of a 
principal speaker and the other remaining input 
audio signals as audio signals ol other speak- 
ers: 

said speaker selecting part outputs said princi- 5 
pal speaker's audio signal and said other 
speakers' audio signals to said predetermined 
one of said N selected audio channels and the 
other remaining selected audio signal chan- 
nels, respectively, for input to said channel 10 
branching part; 

said channel branching part branches each of 
said N input audio signals applied thereto into 
first and second branch channel audio signals 
and output them as said K branched audio sig- ^ 
nals; and 

said sound image control part includes an at- 
tenuation part which, under the control of said 
signal processing control part, attenuates said 
* second branch channel audio signal from said 20 
predetermined one selected audio channel by 
a first attenuation value sufficiently larger than 
the attenuation value of said first branch chan- 
nel audio signal from said predetermined one 
selected audio channel and attenuates each of 25 
said first<hannel audio signals from said re- 
maining selected audio channel by a second 
value sufficiently larger than the attenuation 
value of said second branch channels audio 
signals from said remaining selected audio 30 
channels. 

8. The audio communication control unit of claim 2, 
further comprising an utterance detection process- 
ing part for monitoring levels of said input audio sig- 35 
nals fed through said N input channels via said 
switching part from respectively corresponding said 
N terminals and for detecting the utterance at each 
of said N terminals and wherein said signal process- 
ing control part decides a principal speaker on the 40 
basis of the utterance detected by said utterance 
detection processing part and controls said speaker 
selecting part according to the decision result. 

9, The audio communication control unit of claim 3 or 45 
6. further comprising an utterance detection 
processing part for monitoring levels of said input 
audio signals fed through said N input channels via 
said switching part from respectively corresponding 
said N terminals and for delecting utterance at each so 
of said N terminals and wherein said signal process- 
ing control part decides a principal speaker on the 
basis of the utterance detected by said utterance 
detection processing part and controls said sound 
image control part according to the decision result. ss 

10. The audio communication control unit of claim 2, 
further comprising Q audio signal processing parts 



each composed of said channel branching part, 
said sound image control part and said mixing part, 
Q being an integer equal to or greater than two. and 
N conference selecting parts, and wherein: said ter- 
minal-associated branching part branches said K- 
channel mixed audio signals from each of said Q 
audio signal processing parts in correspondence 
with said N terminals; said N conference selecting 
parts each selects one or more groups from Q 
groups of said K-channel mixed audio signals 
branched for each terminal by said terminal-asso- 
ciated branching part in response to the control of 
said signal processing control part mixes said se- 
lected one or more groups of K-channel mixed au- 
dio signals for each channel and outputs the mixed 
signals as a group of K-channel audio signal; and 
in response to a control by said signal processing 
control part based on conference participation re- 
quest signal, said speaker selecting part outputs 
said input audio signals from the terminals to be par- 
ticipated to said selected audio signal channels cor- 
responding to said one or more audio signal 
processing parts. 

11. The audio communication control unit of claim 10, 
wherein each of said conference selecting part mix- 
es together said K-channel mixed audio signals 
from one or more of said audio signal processing 
parts designated by said conference participation 
request signal Irom said corresponding terminal 
and outputs the mixed signal as said K-channel 
mixed audio signal to be distributed to said corre- 
sponding terminal. 

12. The audio communication control unit of claim 1, 
further comprising control for determining each of 
said N parameter sets which correspond N target 
positions of sound sources different for respective 
said N terminals, wherein said sound image control 
part operates each of said N parameter sets on the 
corresponding K channel branched audio signals to 
produce K channels of said sound image controlled 
audio signal for each terminal. . 

13. The audio communication control unit of claim 12, 
wherein the branching number K is two and each of 
said N parameter sets is a pair of acoustic transfer 
luncttons. 

14. The audio communication control unit of claim 10, 
11, 12 or 13, wherein said signal processing control 
part detects the number of terminals participating in 
a teleconference by detecting signals requesting to 
participate in said teleconference from said N ter- 
minals, determines said target positions for said 
participating terminals according to said detected 
number of participating terminals, determines said 
sound image control parameters of N sets corre- 
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sponding to said determined target positions and 
provides said determined sound image control pa- 
rameters to said sound image control part. 

15. The audio communication control unit of claim 14, 
wherein said signal processing control part detects 
the number N of said terminals connected thereto 
via said switching part and determines target posi- 
tions for said terminals to be symmetric left-right po- 
sitions at intervals of 180/(N-1 ) degrees. 

16. The audio communication control unit of claim 1, 
further comprising a cancelling part for cancelling 
from each of said K -channel mixed audio signals 
distributed by said terminal-associated branching 
part to each of said N terminals, respectively, the 
component of each of said K-channel sound-image 
controlled audio signals from said sound image 
control part corresponding to said each terminal. 

17. The audio communication control unit of claim 1, 
further comprising a multiplexing part for multiplex- 
ing said K-channel mixed audio signals, distributed 
by said terminal-associated branching part in cor- 
respondence with each of said terminals, into one- 
channel audio signal for input into to said switching 
part. 

18. The audio communication control unit of claim 1, 
wherein Q sets of said mixing part and said termi- 
nal-associa ted branching part are provided, Q be- 
ing an integer equal to or greater than 2, said audio 
communication control unit further comprises: a 
combination assignment part for applying said K- 
channel sound-image controlled audio signal from 
said sound image control part, which corresponds 
to each of said N terminals, to designated one or 
more mixing parts; and an inter-combination mixing 
part for mixing together, for each channel, said K- 
channel mixed audio signals distributed from one or 
more of said terminal-associated branching parts 
and for outputting said channel-associated mixed 
audio signals to said each terminal. 

19. The audio communication control unit of claim 1, 
wherein Q sets of said channel branching part, said 
sound image control part, said mixing part and said 
terminal-associated branching part are provided, Q 
being an integer equal to or greater than 2, and said 
audio communication control unit further compris- 
es: combination assignment part for applying said 
input audio signal from each of said N terminals via 
said switching part to designated one or more of 
said channel branching parts; and an inter-combi- 
nation mixing part for mixing together, for each 
channel, said K-channel mixed audio signals dis- 
tributed from designated one or more of said termi- 
nal-associated branching parts and outputting said 



channel-associated mixed audio signal to said each 
terminal. 

20. The audio communication control unit of claim 18, 
5 wherein said K channels are left and right channels 

and said sound image control parts each generate, 
for each terminal, a stereo audio signal of left and 
right channels as said sound-image controlled au- 
dio signal by convolving said branched audio sig- 

10 nals of said left and right channels corresponding 
to said each terminal, respectively, with a pair of 
acoustic transfer functions used as said sound im- 
age control parameters, which correspond to a tar- 
get position of a sound source different for each of 

is said N terminals 

21. The audio communication control unit of claim 20, 
further comprising a signal processing control part 
which detects the number of all terminals participat- 

20 ing in any of said Q teleconferences by detecting 
signals requesting to participate in said common tel- 
econferences from said terminals, determines tar- 
get positions of the same number as said number 
of conference participating terminals, determines 

25 said pairs of acoustic transfer functions as said 
sound image control parameters corresponding to 
said determined target positions and provides said 
determined pairs of acoustic transfer functions to 
said sound image control parts, respectively. 

30 

22. The audio communication control unit of claim 14, 
wherein upon each change in the number of con- 
ference participating terminals by a request to par- 
ticipate in or v leave each of said teleconferences, 

35 said signal processing control part updates said tar- 
get positions according to the new number of con- 
ference participating terminals and updates said 
pairs of transfer functions according to said updated 
target positions and sets said updated pairs of 

40 transfer functions in said sound image control parts. 

23. The audio communication control unit of claim 21, 
wherein, letting the numbers of terminals participat- 
ing in all teleconferences be represented by M, said 

45 signal processing control part determines said tar- 
get positions for all conference participating termi- 
nals to be symmetric left-right positions at intervals 
of 180/(M-1) degrees. 

50 24. The audio communication control unit of claim 19, 
wherein said K channels are left and right two chan- 
nels and said Q sound image control parts corre- 
sponding to said teleconferences each generate, 
for each terminal, a stereo audio signal of left and 

55 right channels as said sound-image controlled au- 
dio signal by convolving a pair of acoustic transfer 
functions as said sound image control parameters, 
which correspond to a target position of a sound 
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source different for each of said N terminals, into 
said branched audio signals of said left and right 
channels corresponding to said each terminal, re- 
spectively. 

5 

25. The audio communication control unit of claim 24, 
further comprising a signal processing control part 
which detects the number of terminals participating 
in each of said teleconferences by detecting signals 
requesting to participate in said each teleconfer- 10 
ence from said N terminals, determines pairs of 
acoustic transfer functions corresponding to target 
positions of the same number as said detected 
number of participating terminals for said each tel- 
econference and sets said determined pairs of is 
acoustic transfer functions in that one of said sound 
image control parts corresponding to said each tel- 
econference. 

26. The audio communication control unit of claim 25, 
wherein, letting the number of terminals participat- 
ing in each teleconference be represented by M P , 
said signal processing control part determines said 
target positions for all conference participating ter- 
minals to be symmetric left-right positions at inter- 
vals of 180/(Mp-1) degrees. 

27. The audio communication control unit of claim 25, 
wherein upon each change in the number of termi- 
nals participating in any one of said teleconferences 
by a request to participate in or leave said any one 
teleconference, said signal processing control part 
updates said target positions of said any one tel- 
econference according to the new number of con- 
ference participating terminals, updates said pair of 
transfer functions according to said updated target 
positions and sets said updated pair of transfer 
functions in that one of said sound image control 
parts corresponding to said any one teleconfer- 
ence. 

28. The audio communication control unit of claim 1 , 
further comprising: N switches which are inserted 
respectively in said N input channels and pass 
therethrough or interrupt said input audio signals; *s 
and an utterance detection part which decides, from 
said input audio signals on said input audio signal 
channels, whether terminals corresponding to said 
input audio signals are in a speaking state, and con- 
trols said switches so that said switches in said input so 
channels decided to be in the speaking state pass 
said input audio signals and said switches in said 
input channels decided not to be in the speaking 
slate interrupt said input audio signals. 
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